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Preface 


This book is about the place and role of combinators in linguistics, and 
through it, in cognitive science, computational linguistics and philosophy. It 
traces the history of Combinatory Categorial Grammar (CCG) and presents 
its linguistic implications. It aims to show how combinatory theories and 
models can be built, evaluated and situated in the realm of the four fields. 
The introductory remarks in the beginnings of early chapters can hopefully 
be excused because of the wide target readership. 

The book examines to what extent knowledge of words can be construed 
as the knowledge of language, and what that knowledge might look like, at 
least on paper. It studies the semantic mechanism that engenders directly in- 
terpretable constituents, the combinators, and their limits in a grammar. More 
specifically, it investigates the mediating relation between constituents and 
their semantics insofar as it arises from combinatory knowledge of words 
and syntacticized combinators. It is not about forms or meanings per se. 

Its key aspect is to promote the following question as a basic scientific 
inquiry of language: why do we see limited dependency and constituency in 
natural language syntax? We owe the question to David Hume by a series 
of links, some of which are covered in the book. The reader might be puz- 
zled by this claim, knowing that Hume had said very little about language. 
I believe he avoided it for a good reason, but the question goes back to him 
nevertheless, as I try to argue in the book. 

It seems that thinking syntax is syntax and semantics is semantics in their 
own structure isn't going to take us too far from the knowledge we have ac- 
cumulated on grammars, about what they can and cannot do regarding code- 
terminism in forms and meanings, and about the coconspiracy of forms and 
meanings. The same goes, I am sure, to thinking discourse is discourse, mor- 
phology is morphology etc. The book focuses on the relationship between 
syntax and semantics. 

Many explanans about syntactic processes become explananda when we 
readjust our semantic radar, a term which I use as a metaphor for looking at 
semantic objects with a syntactic eye. As all metaphors are, it is somewhat 
misleading in the beginning, which I hope becomes less of a metaphor as 
we proceed. If we open the radar too wide, we are forced to do syntax with 
semantic types, and run the risk of missing the intricate and complex syntac- 


vii Preface 


tic dependencies, which in turn might miss an opportunity to limit "possible 
human languages". If it is too narrow, we must do semantics with syntac- 
tic types, and that might take us to the point of having syntaxes rather than 
syntax. Both extremes need auxiliary assumptions to provide a constrained 
theory of language. 

Many syntactic dependencies turn out to be semantic in nature, and these 
dependencies seem to arise from a single resource. This resource is conjec- 
tured to be adjacency. The conjecture of semantics arising from order goes 
back about a century in mathematics, to Schónfinkel, and almost half a cen- 
tury in philosophy, linguistics and cognitive science, to Geach, Ades and 
Steedman. The natural meeting point of the two historically independently 
motivated theorizing about adjacency, the semantic and the syntactic one 
about combinators, is the main story of the book. 

In this regard, the book was bound to be a historical account from the 
beginning. However, it came to provide, in some detail, ways of theory and 
model construction for linguistics and cognitive science in which there is 
no degree of freedom from adjacency. This pertinacious course seems to set 
up the crucial link between forms and meanings with as little auxiliary as- 
sumptions as its progenitors can think of. I believe it sets up creative links 
in theorizing about the computational, linguistic, cognitive and philosophical 
aspects of grammar. I exemplify these connections one by one. 

When we look at combinators as functions they are too powerful, equiv- 
alent to the power of a Turing machine. As such they cannot do linguistic 
work because natural language constituency narrows down the expressible 
semantic dependencies manifested by functions. The linguistic theorizing be- 
gins when we syntacticize the combinators and establish some criteria about 
which combinator must be in the grammar and which one can materialize in 
the lexicon. An explanatory force can be reached if the process reveals predic- 
tions about possible constituents, possible grammars and possible lexicons, 
without the need for variables and within a limited computational power. 
Structure-dependence of natural language strings can be predicted too, rather 
than assumed. 

Every intermediate constituent will be immediately interpretable, and non- 
constituents will be uninterpretable by this process. In other words, being 
a constituent, being derivable and being immediately interpretable are three 
manifestations of the same property: natural grammars are combinatory type- 
dependent. These are the narrow claims of Combinatory Categorial Grammar. 
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The notion of grammar is trivialized if there is no semantics in it. Some, 
like Montague, went as far as claiming that syntax is only a preliminary for 
semantics. On the other hand, language would not be worth theorizing about 
if the semantics we have in mind for grammars is the semantics out there. All 
species do this all the time without language, to mean things as they see fit. 
Words would be very unnecessary, as one songwriter put it in the early 90's.! 
Perhaps they are not always there, as in the lyrics of Elizabeth Fraser.” Sadly, 
words are needed for us mortals, and somewhat surprisingly, they are more 
or less sufficient, if we take them as personal interrelated histories of what 
connects the dots in syntax and compositional semantics, which is embod- 
ied in their syntactic combinatory type, as knowledge arising more than the 
experience. Herein lies a Humean story. 

Although I have tried to keep it to a minimum to compare the present the- 
ory with others, for the sake of brevity and focus, the historical perspective 
in the book makes unavoidable points of contact with different ways of theo- 
rizing about grammars. Some examples are worth noting from the beginning. 
(a) Steedman's and Jacobson's use of combinators for syntax differs when 
it comes to reference and quantifier scope. (b) Kayne claims that structure 
determines order, with directionally-constrained syntactic movement being 
the key element in explanations. Order determines structure in the combina- 
tory theory, and no- movement is the key to explanations. (c) HPSG is another 
type-dependent theory of syntax like the one presented in the book. HPSG's 
types are related to each other by subtyping, whose semantics do not nec- 
essarily arise from order. (d) Type-logical grammar in particular and Mon- 
tague's use of type-theoretic language in general use semantic types to give 
rise to meaningful expressions, that is, to syntax. Order is not necessary or 
sufficient for a set-based type's construal, therefore it need not be the basis 
for meaningful expressions. (e) Obviously not all categorial grammars are 
combinatory categorial grammars. The telltale signs of the latter kind, which 
is the main topic of the book, are no use of phonologically null types, no use 
of surface wrap, some use of type combination that goes above function ap- 
plication, and the insistence on an order-induced syntax-semantics for every 
rule and lexical category, as opposed to for example order and structural unifi- 
cation. (f) Dependency grammars take dependency as an asymmetric relation 
of words in a string, i.e. as a semantic relation between syntactic objects, but 
leave open why there are limited kinds of dependencies, and why these de- 
pendencies relate to surface constituency and interpretability in predictable 
ways. (g) Chomsky's program can be seen as a concerted effort to squeeze as 
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much compositional semantics into syntax as possible. The A-over-A princi- 
ple, the X-bar model, subjacency, cyclicity, filters, functional categories, main 
thematic condition, chains, crash and the process of derivation-by-phase do 
to syntactic trees what they cannot do by themselves: constrain the possible 
semantic interpretations of the syntactic objects in them hypergrammatically. 

The apparent similarities of these theories must be put in context. As Pol- 
lard points out frequently, most theories subscribe to some form of syntac- 
tocentrism because they conceive the relation between forms and meanings 
as indirect. It must be mediated by syntax. The theory covered in the book is 
syntactocentric in Pollard's sense. The syntactocentrism that will be argued 
against here is the one that sees semantics as an appendix to syntax. The 
theory presented here is neither the first nor the only remaining one on this 
stance. 

We need only look thirtysomething years before the rise of that kind syn- 
tactocentrism to find an alternative foundation for bringing semantics into 
syntax. Two historically independent programs, radical lexicalization and 
codeterminacy of syntax and semantics, culminate in a theory where adja- 
cency is the only fundamental assumption. Two aspects will figure promi- 
nently: dependency and constituency. Both will get their interpretation from 
a single source, the semantics of order. 

For the reader: the book is organized in such a way that the technical ma- 
terial that gets in the way of linguistics has been moved to appendices. This 
leaves some aspects of combinators, grammars and computing to the appen- 
dices (mostly definitions and basic techniques). Linguistic theorizing about 
the combinators is in the main text. There is no reference to the appendices 
from the main matter, or from appendices to the chapters in the main body. 
The back matter might refer to earlier ones. Reading all the appendices in the 
given order might help readers who are unfamiliar with some of the terminol- 
ogy. 

Now to pay some debts old and new academic and personal. This book 
started as my I-see-what-they-mean project, although I am not sure about the 
end result. It was an attempt at a collective understanding of Moses Schón- 
finkel, Mark Steedman, Anna Szabolcsi, Pauline Jacobson, Noam Chom- 
sky, Richard Montague, Haskell Curry, Emmon Bach and John Robert Ross, 
among others. I hope the reader does not visit my shortcomings on them. 

At a more personal level, my first contact with Mark and his theory was 
in the years 1992-1994, and since then it has become a major part of my 
academic life. I have asked so many questions to Mark that I am slightly 
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embarrassed I am getting away with an acknowledgment. Before then I was 
fortunate to be taught by great teachers, whom I’m honored to list in some- 
what chronological order: Türkân Barkin, Metin Unver, Ibrahim Nisanci, 
late Esen Ozkarahan, Nicholas Findler and Leonard ‘Aryeh’ Faltz. Some 
friends and family taught me more on academic affairs than I was able to 
acknowledge so far. There is a bit of them in the book but I cannot exactly 
point where. Thank you Canus, née Cihan Bozsahin, Nezih Aytaclar, Zafer 
Aracagók, Ugur Atak, Ragıp Gürkan, Justin Coven, Uttam Sengupta, Halit 
Oguztiiziin, Samet Bağçe, Sevil Kivan, Aynur Demirdirek, Stasinos Konstan- 
topoulos, Mark McConville, Harry Halpin, İrem Aktuğ, Mark Ellison and 
Stuart Allardyce. 

Mark Steedman, Ash Asudeh and Frederick Hoyt provided comments on 
much earlier drafts. Umut Ozge was less fortunate to have gone through sev- 
eral drafts. I owe some sections to discussions with him, and with Ceyhan 
Temürcü, Mark Steedman and Aravind Joshi. Elif Gok, Yağmur Sag, Süley- 
man Taşçı, Deniz (Dee!) Zeyrek and Alan Libert suggested corrections and 
clarifications for which I am grateful. Finally, special thanks to the Mouton 
team, Uri Tadmor, Birgit Sievert, Julie Miess, Angelika Hermann and the re- 
viewers for comments and assistance with the manuscript. Livia Kortvelyessy 
of Versita helped me get the project going. 

I am solely responsible for not heeding good advice of so many good 
people. 
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Chapter 1 
Introduction 


On December 7, 1920, Moses Ilyich Schónfinkel made mathematical history 
when he presented to the Góttingen Mathematical Society his results about 
variables. It was to be his only work on the topic, which was prepared for 
publication by Behmann (Schónfinkel 1920/1924)? Little would he know 
that in this brief seminar he was going to change the course of computing and 
linguistics too, two fields which flourished in the remainder of the century.^ 

He simply eliminated variables—bound variables. In theory, any lambda 
term with no free variables is a combinator in Schónfinkel's sense, and all 
the bound variables in it can be eliminated. In practice, two combinators suf- 
fice to compute any discretely representable dependency, and that takes us 
to language and computing. We shall see that although this is good news 
for computing because we can rigorously identify a computable fragment 
of functions, it requires much extra effort in linguistics to become a theory 
because we will need some empirical criteria and a theory to constrain this 
power: we know that human languages do not manifest every computable 
dependency. 

The list of names who worked on the variable reads as a ^who is who" in 
mathematics and philosophy: Curry, Frege, Herbrand, Hilbert, Peirce, Rosser, 
Skolem, and later, Quine and de Bruijn. They were a concern for the math- 
ematician, linguist, logician, philosopher, and the computer scientist. Natu- 
rally, discovery of different methods was expected.? 

The way Schónfinkel set about to go at it is what made the overarching in- 
fluence beyond mathematics. He gave semantics to order, and order alone, by 
devising an ingenious way to represent all argument-taking objects uniformly, 
something that eluded Frege in his lifetime although he had anticipated it. 

Schónfinkel represented a function f of n arguments (1a) as an n-sequence 
of one-argument functions (1b). Assuming left-associativity for juxtaposi- 
tion, it is now standard practice to write (1b) as (1c), as Schónfinkel did. 


(1) a. f(x1,%2,-++%n) 
b. (---((fx1)x2) xa) 


C. fX|X2*** Xn 
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This is what we now call Currying. (I must confess that Schénfinkeling 
is alive in secret sects.) Haskell Brooks Curry was the first to realize the im- 
portance of the technique, and made very frequent use of it in his arguments 
(and Schónfinkel faded into oblivion), hence the name. The technique had 
been taken for granted mainly because it was so simple. 

Its manifestation in language can be easily taken for granted too. It trans- 
lates to one-at-a-time word-taking in a surface string such as (2). 


(2) (((((U wonder) who) Kafka) might) have) liked.) 


Every parenthesized expression in the example except the outermost one 
is syntactically incomplete, yet semantically interpretable, in the sense of be- 
ing a function of type Ax.m'x, where m’ is a crude way to symbolize the 
incrementally assembled semantics of each parenthesis. 

For an n-argument object f (x;,...,x,) written in the traditional notation, 
we can obtain a variableless representation of f using eta-reduction and com- 
binators. Variable-free functions capture the content and its combinatory be- 
havior without reference to extraneous objects. 

An early comparison of variable-friendly syntax with variable-free syntax 
shows us that the aim is not to simply clean up theoretical tools and vo- 
cabulary. Its primary motivation is empirical: if a string of objects can have 
a variable-free function, then they are immediately interpretable. Taken to 
its logical conclusion, it means that any intermediate phrase has instantly 
available semantics. For example, the man who Mary loved is taken to arise 
from the structure the man who [Mary loved  ] in variable-friendly syntax, 
where the empty element (a syntactic variable) awaits interpretation. Con- 
sequently, the phrase it is part of waits for interpretation too. In variable-free 
syntax, Mary loved is semantically Ax.Clove'mary x, which is eta-convertible 
to Clove'mary', where C is one of Curry's combinators. It does not need any- 
thing else to be interpretable. It needs something to become a proposition, 
but that is more than being interpretable. By direct import of combinators 
to variable-free syntax, we get immediately interpretable intermediate con- 
stituents as well. This is the main story of the book. 

Variables cannot be eliminated at the expense of lexical proliferation or 
loss of semantics. For example, we cannot assume that /ove above is intran- 
sitive, which would give the structure [Mary loved]. That is to say that all 
strings are inherently typed as grammatical objects, such as the word /oved 
(transitive) and its meaning love,’ which is (e, (e,t)). Here I follow the tradi- 
tion of writing the meaning of words with primes. The ubiquitous adage The 


Introduction 3 


meaning of life is life,’ attributed by Carlson (1977) to Barbara Partee and 
Terry Parsons, will serve as a convenient base for compositional semantics in 
subsequent chapters. 

For us to continue giving semantics to any intermediate phrase, the argu- 
ment structure of words must be curried too. We can take AxA y.mark'xy to 
be equivalent to mark,’ as in Twain marks two fathoms. Such curried abstrac- 
tions are required by phonology because we cannot substitute two or more 
arguments at the same time. 

We would be home and dry if all function-argument dependencies in lan- 
guage were that simple, but we know that for example Ax. f xx and Ax. fx(gx) 
are possible configurations, hence simple eta-conversion would not always 
work. Some examples are Kinski adored himself, which has the dependen- 
cies adore'kinski'kinski,’ and I have read without understanding, which is 
(not'understand' x)(read' x i)i!, for some x, for example the books I have read 
without understanding. 

The problem of capturing dependency, constituency and immediate inter- 
pretation is exacerbated by mixed-branching (3a) and right-branching (3b) 
demanded by language: 


(3) a. (I wonder) (who Kafka might have liked) (and what Wittgenstein 
might have written.) 
b. I (begin (to (try (to (avoid (reading Kafka before sleep.)))))) 


The informal notion of constituency I employ here and denote with paren- 
theses will be clarified throughout the book, which is inextricably tied to de- 
pendency, intonation, informativity and interpretability, and by direct import 
of combinators, to syntactic combinability. 

Notice the tension between left-to-right curried open interpretations such 
as (4a) and the rightward dependencies required by semantics, as in (3b). Both 
kinds of branching are reflected in syntax by constituency, for example (4b). 


(4) a. (((((((((1 begin) to) try) to) avoid) reading) Kafka) before) sleep.) 
b. (I begin to try to avoid), (and you should refrain from), (reading 
Kafka before sleep.) 


The inadequacy of eta-conversion for the semantic side of the constituents 
is where Schónfinkel's combinators come into the picture. For example, the 
dependency in Ax. fxx is not eta-reducible to f, hence we have no way of cap- 
turing the dependencies in Kinski adored himself without variables or combi- 
nators. We can eta-normalize Ax. f xx to Ax. W fx —4W f, without variables. 
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We can say that BSC symbolizes the dependency Ax.f(gx)x which we 
can observe in the bracketed part of the string the books, |I have read _x 
without understanding |. x], without variables. 


(5) BSC(not'understand")(read')i! = (not'understand")(read' i'i! 
We can also assume that the inner dependency symbolized by the syntactic 
variable * x' is S: 
(6) S(not'understand')read' = Àx.(not'understand' x)(read' x) 


Given the combinators and the process of eta-normalization, knowing 
a word in the combinatory sense becomes the problem of capturing its 
predicate-argument dependency structure in direct correspondence with its 
syntax and constituent structure, without variables. 

Schónfinkel's method allows us to capture the syntacticization of seman- 
tic dependencies with a handful of combinators, all of which are based on 
adjacency. Below is the semantic side of the story, where the strings in paren- 
theses are interpreted. 


(7) a. (Kafka adored) and Wittgenstein loathed mentors. 

B(Tkafka')adore' = Ax.adore'xkafka' 

b. I offered, and (may give), a flower to a policeman. ` (Steedman 
1988) 
B?may'give! = AxAyAz.may' (give'xyz) 

c. He is the man I will (persuade every friend of) (to vote for). (Steed- 
man 1996b) 
Spefo'tvf' = Ax.pefo!x(tvf'x) 

d. (What you can) and what you must not base your verdict on (Hoyt 
and Baldridge 2008) 
O(AQ.?xQx)(you'can') —?xAP.can' (Pxyou') 


The combinators involved in (7) are all that we need for human languages. 
(And they have a common bond; see the conclusion.) This is the conjecture 
of CCG. The book attempts to show how CCG builds these dependency and 
constituency structures through syntactic types. It pairs phonological strings 
with predicate-argument structures in a radically lexicalized manner. 

Here is the preview of the syntax of these constituents. We shall see 
how the semantically-motivated combinators above lead to the syntactically- 
realized ones below, by a direct translation made possible by the semantics 
of order. We will need a linguistic theory in addition to this translation be- 
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cause the claim is that not all the combinators can materialize as syntac- 
tic. 
(8) a. Kafka adored 
NP (S\NP)/NP 
T 


S/(S\NP) 
S/NP 
b. may give 
(S\NP)/(S\NP) (S\NP)/PP/NP 
(S\NP)/PP/NP — 
c. persuade every friend of to vote for 
(S\NP)/VP/NP VP/NP 


(SNNP)/NP 
d. What you can 
S/(S/NP) S/VP 
S/VPJNP) ` 

These examples also show the workings of a syntactic type-driven deriva- 
tion. The syntactic types of the meaning-bearing elements do all the work 
in derivations. By a common convention dating back to 1930s (Ajdukiewicz 
1935), the derivations are shown bottom-up, with leaves on top and the root 
at the bottom. Each line is a step of the derivation. Unlike phrase-structure 
trees which show a description of structure, these sequences are algorithms 
of structure-building by the string. The string span of a derivation shows the 
coverage of the substring for the derivation. The combinator that engenders 
the derivation is written at the right edge for exposition. In these example 
it is the syntacticized version of BTSO, decorated as e.g. (^ B). Seman- 
tic assembly is immediate (and not always shown), precisely because of the 
combinatory source of every syntacticized combinator. 

The syntactic types of (8) are related to semantic types of (7) system- 
atically. For example, (7a) suggests that Kafka’ is type-raised by T, which 
manifests itself as the syntactic type S/(S\NP) in English (8a). It undergoes 
B with adore' semantically according to (7a), which materializes syntacti- 
cally as the composition of S/(SNNP) and (SNNP)/NP, which is an instance 
of syntactic B, X/Y Y/Z — X/Z. Traditional constituents which are familiar 
from tree drawings, such as those in (9), will turn out to be a consequence of 
the combinatory primitive, function application, decorated as (7) and («). 
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(9) a. Kafka | adored Milena 


Kafka 
NE SNE) NE NE adored Milena 
S\NP 
S < 
b. I gave a flower to a policeman 
(SNNP)/PP/NP NP PP 
(S\NP)/PP 
S\NP 7 


to a policeman 
gave a flower 


It would be tempting to think of slash introduction (directionality) on the 
syntactic side as the equivalent of eta-conversion on the semantic side, but 
that would be misleading. If it were true, we could do syntax completely 
with semantic types. The syntactic type of the word adore above is indeed 
equivalent to its eta-normalizable semantics AxA y.adore'xy, i.e. one slash per 
lambda-binding, but these slashes depend on surface adjacency, hence e.g. 
(S/NP)/NP would be wrong for adore or for any English transitive verb. 
Additionally, some lambdas are not syntactic lambdas, e.g. Ax.man'x for the 
word man, which is eta-normalizable to man’ but its syntax is not N/N or 
N\N in English. These aspects show that combinators and their one-to-one 
syntacticization do not amount to a linguistic theory. This is where the lin- 
guistic theorizing begins for combinators. 

Let me finish the preliminaries of the book with an assessment of Schón- 
finkel by Quine. I shall return to this quote in the final chapter. 


It was letting functions admit functions generally as arguments that Schón- 
finkel was able to transcend the bounds of the algebra of classes and re- 
lations and so to account completely for quantifiers and their variables, as 
could not be done within that algebra. The same expedient carried him, we 
see, far beyond the bounds of quantification theory in turn: all set theory was 
his province. His C,S,U and application are a marvel of compact power. But 
a consequence is that the analysis of the variable, so important a result of 
Schónfinkel's construction, remains all bound up with the perplexities of set 
theory. Quine (1967: 357) 


The essence of combinators for language is to turn a simple concept like adja- 
cency into a scientific tool with clear limits and predictable syntactic and se- 
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mantic (im)possibilities, precisely because variables are eliminated to model 
adjacency or adjacency-like effects. And without them constituency and de- 
pendency can easily tell whether our hypothesis about a certain construction 
is right or wrong. That seems desirable for achieving descriptive adequacy 
of grammars. There is very little degree of freedom when the entire theory 
is based on a single understanding of adjacency. That will hopefully carry an 
explanatory force when syntax and semantics are considered together. 

The rest of the book is organized as follows. Chapter 2 introduces type- 
dependent syntax, where the driving force of the syntactic process, the syn- 
tactic type, arises from the semantics of combinators. Chapter 3 presents ar- 
gumenthood from the perspective of combinators. This is crucial for lexical 
capture of dependencies in the predicate-argument structure. It also suggests 
that the lexicon might be the source of undecidability if and when it is rele- 
vant. A more revealing aspect of combinators turns out to be what they deliver 
about discrete representability, rather than infinitude or decidability. Chap- 
ter 4 shows that the syntactic types of combinators cannot be arbitrary, due 
to having the same base for syntactic and semantic juxtaposition. Chapter 5 
builds a substantive base on these formal foundations to present CCG as a lin- 
guistic theory. Chapters 6 through 8 discuss some variations in CCG theory: 
logical form (Chapter 6), possible constraints on all grammars (Chapter 7), 
and possible extensions of the invariants (Chapter 8). Chapter 9 evaluates 
some linguistic, philosophical, computational and cognitive aspects of CCG, 
all of which stem from bringing semantics into the explanation. Chapter 10 
shows that CCG’s computation must distinguish opaque and transparent pro- 
cesses, and that this leads to a syntactic simplification of its primitives to a 
single operation rather than two. 

In conclusion (Chapter 11) a historical perspective is reiterated where ad- 
jacency as the sole hypothesis-forming device is singled out as CCG’s most 
unique aspect, rather than variable elimination. This seems to be Schón- 
finkel's legacy. 


Chapter 2 
Order as constituent constructor 


The semantic dependencies in a PADS must manifest themselves transpar- 
ently in syntax for them to take part in constituencies and their interpretation, 
and for order (therefore adjacency) to remain as the only explanatory device 
for the syntax-semantics connection. This process will be called syntacticiza- 
tion throughout the book. The result is the embodiment of combinatory be- 
havior in complex symbols called syntactic types. 


1. Combinatory syntactic types 


The notion of syntactic type has been imported to linguistic explanation, to 
the best of my knowledge, by Bar-Hillel, Gaifman and Shamir (1960), Mon- 
tague (1970) and Gazdar (1981). Gazdar credits Harman (1963) for the first 
use of complex symbols in phrase-structure grammars, whereas Bar-Hillel et 
ale and Montague's use relates to Lesniewski’s and Russell's types as func- 
tions. 

Formally speaking, a type is a set of values. For example, we can think of 
the grammatical relation subject as a type, in English standing for the set of 
values (John, Mary, he, she, it. ..). We can distinguish it from other types, say 
from the type object, which would be in English the set (John, Mary, him, her, 
it...). We can also think of types for verbs, such as tv for transitives, which 
would be the set (hit, devour, read...}, and iv for intransitives, say (arrive, 
sleep, read... }. These sets can be countably infinite, which makes their finite 
representation by a type label even more significant. 

Montague's deployment of a Russell-style type-theoretic language aims to 
give rise to meaningful expressions from a semantic type o (his MEg), hence 
a simple label such as “subject” above would not do in his framework. His 
choice is to syntacticize a denumerable number of types œ by building them 
into MEgs. Such construal need not be variableless or order-induced. 

As atomic labels in a phrase-structure grammar, types would bear no more 
significance than distributional notion of a category, such as N, V, A and P, 
for nouns, verbs, adjectives and prepositions, which are commonly employed 
in linguistics. This was the motivation for Harman (1963) to make complex 
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symbols first-class citizens of a phrase-structure grammar. In such complex 
symbols the notion of structure is assumed, rather than explained by adja- 
cency. 

What brings surface string generalizations of types from order-predicted 
semantics and syntax is the notion of a combinatory syntactic type, as em- 
ployed in categorial grammars. For example, we can refine the type "subject" 
above as S/(SNNP) for English, which says that any value that takes a right- 
ward VP as domain (because the label VP is typewise S\NP), and yield a 
sentence as a result, belongs to the set of subjects. The “object” type would 
be different, for example S\(S/NP) for English. Although syntaxwise they 
differ, they arise from the same semantics, which is that of T, because A P Pal 
is the semantics underlying this type, which means all functions P in which 
a’ participates as an argument, which is a unary application of the combinator 
T(t." 


(1) T em Axdy.yx 


Syntacticization in this particular case refers to how the semantic depen- 
dencies engendered by T directly imports to syntactic types such as S/(S\NP) 
and SN(S/NP) without further assumption. As shown in Chapter 4 the process 
is transparent, but it is not trivial, because syntactic dependencies carry dif- 
ferent features than what is borne by semantic objects. For example, English 
subject-verb agreement spells the distinction S/(S\NPagr) for subjects and 
SN(S/NP) for nonsubjects where “agr” is a feature bundle for agreement. For 
Welsh, a strictly VSO language with subject-verb agreement, the distinction 
is between S\(S/NPagr) for subjects and S\(S/NP) for nonsubjects. Another 
lexical resource, the verb in this case, complements the picture by bearing 
the lexical type S/NP/NPagr for a Welsh transitive verb and S/NPagr for an 
intransitive. These types are (S\NPagr)/NP and S\NPagr for English. 

The type S/(S\NP3s) for the word “Wittgenstein” syntactically de- 
notes all functions that can be construed as an English speaker's knowl- 
edge of all things “Wittgenstein” can grammatically do, in semantic terms, 
À P.Pwittgenstein', because it captures the following contrasts. 


(2) Wittgenstein adores/*adore westerns. 
Does/*do Wittgenstein adore westerns? 
Milena writes more letters than Wittgenstein does/*do. 
Wittgenstein I am sure takes/*take more notes than he publishes. 
Wittgenstein you say is the one who adores/*adore westerns? 
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They adore/*adores Wittgenstein for that? 

Wittgenstein I like/*likes, Russell I doubt. 

* ?the film which might startle the critics and Wittgenstein would adore 
the film which might startle the critics and which Wittgenstein would 
adore 


We can also symbolize all things that can be predicated over “Wittgen- 
stein", in the syntactic type $\(S/NP) for an English speaker, which also 
has the semantics A P.P wittgenstein’. It takes another lexical resource to turn 
this into agreement. Because the English verb does not have (S\NP)/NPagr 
for agreement, this possibility is avoided in English syntax. Therefore 
the category S\(S/NP) serves an entirely different syntactic function than 
S /(S\NP3;) of a subject participant. The knowledge of the word “‘Wittgen- 
stein” is then construed as all possible categories that it can bear, in the form 
of syntactic type/predicate-argument structure pairs. 

This construal of syntax-semantics correspondence can be compared with 
other type-dependent approaches. In Montague’s type system, where order 
does not step in to provide an interpretation, the type of a transitive verb 
is ((e,(e,t)),(e,t)), which is model-ready for interpretation. In this sense, 
Montague's Intensional Logic is dispensable as he pointed out himself (Mon- 
tague 1970), in favor of a model-theoretic interpretation; see e.g. Dowty, Wall 
and Peters (1981) for discussion. In our case the type simply refines (or con- 
strains) the correspondence of the syntactic type to its PADS. It is part of what 
computational linguists call a typed “glue language." 


2. Directionality in grammar: morphology, phonology or syntax? 


The term string type descriptor for ‘:=’ in he := S/(SNNP3,) brings to mind 
whether we could entertain the possibility that some of these contiguous 
strings, namely words in the ordinary sense such as adores are derived com- 
binatorially, or taken as axioms (lexical items) of a combinatory system. 
The first view is adopted here without elaboration. The second view would 
amount to taking ‘:=’ as the lexical type assignment operator. Equivalently 
we would be asking whether the word-internal compositional meaning as- 
sembly and constituency are mediated by the combinators as well, which is 
implicated by the view preferred here. I do not elaborate on it because the 
book covers no lexical dependency which refers to a part of another word. 
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The question brings forth the issue of morphology-phonology interaction 
during syntactic type-driven derivation. I will say nothing about these as- 
pects in this book, because they need a book-length treatise of their own, 
which is upcoming work. Suffice it to say that we need to have a closer look 
at Separation Hypothesis in morphology (Beard 1987, 1995), that morpho- 
logical and phonological types do form assembly, and syntactic-semantic 
types the meaning assembly. Modern morphological theories such as that 
of Lieber (1980), McCarthy (1981), Anderson (1992), Halle and Marantz 
(1993), Aronoff (1994), Beard (1995) and others need studying from a type- 
dependent perspective, to see if combinators are responsible for the meaning 
assembly in constructions involving parts of words and phrases. 

Thus we will not be concerned whether the derivation of the following 
example from Arabic must compose the passive and the causative first, by 
B as shown, or whether we apply them one-at-a-time to the stem, which is 
also possible with the same type assumptions. 


(3) -u- -h- dahika Ahmad Nadeem 
PASS CAUS laugh A N 
(S/NP/NP)/(S/NP/NP) (S/NP/NP)/(S/NP) S/NP NP NP 
:APAxdAy.pass'(Pyx) :APAxAy.cause'(P(y))x :Ax.laugh'x : a in! 
B 


uh- := (S/NP/NP)/(S/NP) 
: APAxAy.pass' (cause (P(x))y) 


duhhika := S/NP/NP : AxÀy.pass' (cause' (laugh! x)y) 
duhhika Ahmad := S/NP ` Ay.pass' (cause (laugh'a!)y) 
duhhika Ahmad Nadeem := S ` pass' (cause (laugh'a')n') 
‘Ahmad was made to laugh by Nadeem.’ 


Notice also the assumption that morphology and phonology somehow get 
it right that -wh- is a templatic infix to the verb stem. Crucially, the direction- 
ality of the slashes does not reflect morphology of Arabic. It is a syntactic 
constraint with a semantic motivation; in this case for example the passive 
looks for lexical verb categories. 

We get grammatical derivations the same way in all languages by medi- 
ating them only through syntactic types, independent of their morphological 
or phonological typology, including for example the templatic morphology 
of Arabic, because syntactic dependencies relate to compositional semantics 
of words as they are embodied in syntactic types. The combinatory theory 
described in the book goes as far as claiming that the types above regulate the 
scope of e.g. the passive and the causative, because they are syntactic pro- 
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cesses. They regulate the behavior so that we get pass'(cause' (laugh'a')n') 
above, not cause’ (pass (laugh'a')n'), which is what we would also get if we 
let morphological types and phonology do the semantics, say by applying the 
passive a +> u to dahika first, and then the geminate causative to /h/. How 
these types arise from interfaces morphologically and phonologically is not 
covered in the book. 

Further empirical support for dissociating syntactic directionality from 
morphological or phonological directionality comes from languages such 
as KVak"ala where some nominal inflections fall on the preceding word, 
whatever its category. For example, in Figure 1, -s and -is are suffixes on 
lewinux"'a but they relate syntactically to mestuw-i. Similarly, -ida is a suffix 
on the preceding verb to which it bears no syntactic relation. The slashes in 
the figure reflect syntactic directionality rather than suffixation or morpho- 
logical order. 

In summary, the slash can only do syntactic work in a combinatory the- 
ory. If it takes on other duties such as morphological order (as it does in some 
versions of categorial grammar such as Hoeksema 1985), it cannot simulta- 
neously undertake morphological work and afford not to immediately deliver 
semantics of some constituents. It would be forced to do that when composing 
the preceding word of an inflected nominal in KVak"'ala morphologically and 
phonologically because the semantics of the inflections would be unrelated to 
the morphological/phonological host. Positing phonologically vacuous types 
to remedy the problem would undermine the combinatory base of grammar 
because, in the process of syntacticization, only phonologically discernible 
elements can be given immediately deliverable semantics by combinators. 
Relaxing the directionality interpretation of a combinatory slash to allow syn- 
tactic, morphological or phonological order is not a degree of freedom in a 
combinatory grammar. 


3. Trees and algorithms 


The preceding discussion suggests that what we see in a combinatory deriva- 
tion is a step-by-step syntactic and semantic assembly, not morphology or 
phonology. The style of the presentation wants explaining. Drawing the 
derivation in (3) as a tree reveals its strictly binary nature. This is shown 
in Figure 2. The same derivation could be drawn using the more familiar tree 
notation (Figure 3), but it would be misleading for three reasons. 
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Figure 1. K"ak"'ala's syntactic bracketing, adapted from Anderson (1992: 19). 
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Figure 2. A CCG derivation as a tree. 
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Figure 3. A CCG derivation as a phrase-marker tree. 


First, structure-building in a combinatory derivation crucially depends on 
the linear sequence of types, which is explicit in a notation such as (3) but not 
in a phrase-structure tree. 

Second, the combinatory process must start with the lexical assumptions. 
Otherwise there would be no way to achieve the immediate assembly of lex- 
ically projected semantics, whereas a tree can be built top-down, bottom-up 
or with a mixed strategy. In other words, a combinatory derivation is a con- 
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structive proof, an algorithm, of the structure-building, whereas a tree is its 
description. 

Third, there are no intermediate records in a CCG derivation, which also 
breaks the ties with logical proofs. There is no sense in which any subtree 
would be available for reinterpretation, reuse, retraversal or reinspection.? 
For example, after the derivation of duhhika above, we have the substrings 
duhhika, Ahmed and Nadeem as remaining work, without any rework or in- 
spection. This is most explicit in line drawings, which can be viewed as walls 
built around the range of the derivation. I will use the standard linear notation 
throughout the book. 


4. CCG’s narrow claims in brief 


Combinators as syntactic tools must encode and project dependencies just 
like they do when they operate on semantic objects. We must preserve this 
property throughout syntacticization so that we can claim the same origin 
(order) for structure and its interpretation. For example, a binary version of 
B, as in Bfg = Ax.f(gx), suggests that f depends on g which depends on 
x, whatever x is when it is instantiated. No combinatory rule or dependency 
can change the dependence of f and gon x once we obtain Ax.f(gx) by B. 
Parenthesis-free combinators such as C encode and project dependencies too. 
Cfab = f ba, hence the order of the arguments matter to f in this example; it 
is a genuine dependency. 

The syntactic process of combination might look similar in spirit to depen- 
dency grammars such as Tesniére (1959), Hudson (1984), Mel’éuk (1988). 
However, the narrower claim is that only the syntactic types bear on con- 
stituent structure, and they arise from semantics of order, therefore the pro- 
cess of syntacticization is crucial, and adjacency is all we need for it. 

Having no degree of freedom from adjacency will force us to entertain nar- 
row hypotheses about possible syntactic categories, therefore possible gram- 
mars, and about a basic inquiry of linguistics promoted in the preface: 


(4) A Humean question for linguistics: 
Why do we see limited dependency and constituency in natural lan- 
guage syntax? 


Here is a brief preview of the limited constituency engendered by the 
syntacticized combinators. Although maximal left bracketing is allowed, not 
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all substrings are constituents, for example *(mathematicians in)(ten). Some 
constituents are quite unorthodox, such as I know that three and you think that 
four mathematicians in ten prefer corduroy. This much is inferrable from the 
well-formed fragment of (5c). 


(5) a. I know that three mathematicians in ten prefer corduroy. 
S/(SNNP) (S\NP)/S' 
>B 
S/S' 
b. I know that three mathematicians in ten prefer corduroy. 
S/S’ S'/Sgn 
————>B 
S/Stin 
c. I know that three math. in ten prefer corduroy. 
S/Sa | (S/(SNNP))/N N (NNN)/NP 
>B? 
(S/(S\NP))/N 
7 N ?? 
d. I know that three mathematicians in ten prefer corduroy. 
(S/(S\NP))/N N (S\NP)/NP NP 
S/(SWP) í 
S/NP v 
S > 


All constituents are immediately interpretable, and none of the noncon- 
stituents are interpretable. These are the combinatory predictions about order- 
engendered constituent structure. 


5. Type-dependence versus structure-dependence 


A further consequence of CCG's narrow claims is that all natural language 
grammars must be type-dependent to be able to deliver all and only the im- 
mediately interpretable constituents. Type-dependence as a research program 
does not deny the structure-dependence of natural language strings. The main 
goal is to explain structure-dependence as arising from something other than 
structure, from adjacency and its semantics. Positing a sequential origin for 
structure presumes that structure-dependence is an epiphenomenon, and with 
it goes the primary use of variables for structure-building. 
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Perhaps the best known work for variables in syntax is Ross's (1967) Co- 
ordinate Structure Constraint (CSC). His thesis was a bold attempt to con- 
strain the syntactic variables. The motivation was to avoid overgeneration of 
the semantics of the constructions involving such kind of variables. Putting 
together the desire to constrain the semantic behavior, and employing syn- 
tactic variables for this task, we can conclude that these variables must range 
over structures, rather than strings or words. 

Structure-dependence is the hallmark of transformationalism, both in the 
theory and in the data. Chomsky's transformations have changed over the 
years, but they have always maintained one property: structure preservation. 
According to this theoretical dictum transformations only apply to structured 
strings, represented as phrase-markers, to produce structured strings. In terms 
of data, assuming structure-dependence is the starting point for the nativist 
explanations of language acquisition (Crain and Pietroski 2001). 

I will present structure-dependence and type-dependence in their own 
terms, and compare their claims. In the examples of structure-dependence 
below where the process of question formation pairwise relates a-examples 
to b-examples, the relevant relations are structural dominance and structural 
locality of labels. 


(6)a. Kafka [liked Milena]yp. 
al. John [thinks that Kafka |liked Milena]yp]yp. 
a". | The lady who I | think Kafka |likes|yp]yp |np ladored flowers] yp. 
b. Did Kafka like Milena? 
b'. Does/*did John think that Kafka liked/*likes Milena? 
b". Did/*do/*does the lady who I think Kafka likes/*like/*liked adore 
flowers? 


I use the notation | | to represent the syntactic label T of the substring 
in brackets. For example, in (62^), the inner VP is dominated by the outer VP. 
In (6a"), the outermost NP and the last VP are structurally sisters, hence local 
to each other. The stars in b'—b" examples are meant to indicate that the mean- 
ing conveyed by an a-example cannot be questioned like the corresponding 
starred b. 

If structural dominance were not critical, we would have the starred do's 
in the b-examples as grammatical. If locality were not the determinant for the 
sisterhood of the subject, the starred /ike examples in b's would be fine too. 

A simple inductive heuristic on the position of do or like (“for the choice 
of do, use a verb that appears later when the string is longer"), which might 


Type-dependence versus structure-dependence 19 


work for (6a"), would not work for (7a). Similarly, a simple label match of 
VP by order would not work either (7b-c). 
(7) a. The man who sleeps liked the lady who reads Kafka. 
b. Kafka [ [while sleep]yping|4qyp dreamed about Milena. 
c. *Did Kafka while sleep dreamed about Milena? 

These examples are type-dependent as well as being structure-dependent. 
For example, we can think of yes-no questions as imposing the following 
constraints on do, where the syntactic labels are now combinatory syntactic 
types (constraints) rather than distributional categories. 


. . ? 
b. [Does] (Sas NNP)/NPss nae i Milena: 
C. [Do s. / (Sis NNP) / NP as you like Milena? 


With these assumptions, (9a) is ruled out by type-dependence without the 
help of structure-dependence. The inner Si, NP is not visible to the word 
does, and the string think..Milena cannot bear the syntactic type Sinf NP. 


(9) a. *[Does Leen /(Sinp\NP)/NP3g Kafka |think |adore Milena|s. wp]? 
b. Do [you] NP, think that [Kafka] 5 1($\ P35) liked/likes/*like 
Milena? 
. liked := (Sgy NNPag) /NP 
. likes = (Sfin\NP3s)/NP 
. like := (Say \NP-3s)/NP 
. like := (Sing\NP)/NP 


Agreement is always encoded for subjects, as in NP», for you in (9b), also 
(Sys / (Sit NNP )) N(Syn / (Sin AN DU / NPos), and for Kafka, as S/(S\NP3s). This 
is enforced by the lexical differences in (9c—f).? As in structure-dependent 
accounts, the category of embedded /ikes cannot project as the type of the 
clause headed by think. The critical type-dependent steps are shown below: 


vn Ch D OC 


(10) Do you think that Kafka likes Milena? 
Syn/ (Sint NNP)/ NP A (Syn/(Sint\NP))\ (Sint\NP)/Stin Stin 
((Syn/(Sint\NP))/NPos) 
Syn/(Sint\NP) i Sinf NP i 


Notice also that the choice of liked and likes in (9b) is not related transforma- 
tionally as in (6a'/b^). They produce different semantics to begin with, which 
is a consequence of radical lexicalization. There are no deeper structures, with 
surface structures derived from them. 
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Structure-dependence and type-dependence begin to make different pre- 
dictions when we observe that there might be (a) same structures which must 
bear different types, and (b) different structures which must bear the same 
type. In a type-dependent theory, different types mean differential behavior, 
and having the same type means manifesting the same syntactic behavior. The 
first kind is CCG's answer to CSC, without extraneous constraints, principles 
or variables. Let me briefly exemplify case (b) before we move to CSC. I will 
draw on Turkish data. 

Common nouns and adjectives in Turkish are collectively called substan- 
tives because they show similar morphological characteristics when used as 
nouns, such as the same case, person and number marking. Their common 
semantics, that of being a property, which is syntactically NP/NP, is trans- 
parently imported to Turkish syntax in structures that widely differ in their 
internal structure but behave similarly in syntax. 

We can for example form relative clauses which differ structurally in sub- 
ject versus nonsubject extraction (11a—b), but both kinds can be headless as 
well, in which case they undergo the nominal paradigm in inflections as if 
they were noun stems (11c-d). 

(11) a. [/stanbul'a gid-en|wp Np otobiis 
Jet DAT — go-REL bus 
“The bus that goes to Istanbul’ Turkish 
b. [/stanbul'a git-tig-im yP /yp otobiis 
Ist-DAT — go-REL.1s bus 
"The bus with which I went to Istanbul 
c. [[Istanbul’a gid-en|wp inp |wp-ler-i ben gór-me-di-m. 
Ist-DAT go-REL-PLU-ACC I see-NEG-PAST-1s 
‘I did not see the ones that go to Istanbul.’ 
d. [[Istanbul’a git-tik |y p/Np |np-ler-im daha güzel-di. 
Ist-DAT go-REL-PLU-POSS.1s | more beautiful 
"The ones with which I went to Istanbul looked better.’ 
In these examples the headless variety cannot be thought of as cases where 
biri ‘one’ is deleted. For example, (11a) and (11c) are related and the readings 
are quantificational, but if we use biri or sey ‘thing’ in (11c), e.g. Istanbul'a 
giden şeyleri ben görmedim (‘I did not see the ones that went to Istanbul’), it 
is nonquantificational. Therefore these are different structures. The examples 
have the additional property that, independent of the structural source, be 
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they a suffix, a lexically specified adjective (12), or a derived clause such 
as a headless relative clause, they can behave as anaphors if their type is a 
predicative NP. They have a unique semantic function syntactically. 


(12) [Zengin] yp /NP kriz-den ` etkile-n-me-di. 


Rich crisis-ABL affect-PASS-NEG-PAST 
“The rich has not been affected by the crisis.’ 


In other words, Turkish seems to make no distinction in syntactic behavior of 
the types NP/NP and NP if the semantic origin of the NP is that of a property, 
independent of its internal structure. Compare the clausal structure of these 
examples with a nominal NP structure (13). 


(13) [Her yeni otobüs-ün ` koltu£-u|wp 
every new bus-GEN.3s seat-POSS.3s 
“every new bus's seat’ 


The other case which differentiates type-dependence from structure- 
dependence is when similar structures show differential application in syntax, 
as in CSC. 

Ross's solution to CSC, that coordinands are islands of extraction with a 
single escape boat, which is to extract across the board (ATB) from each co- 
ordinand, and only for constituents with the same grammatical function in ev- 
ery coordinand, proved to require transderivational constraints for structure- 
dependent theories. No one has come up with an effective and nonarbitrary 
solution to such constraints which would keep the problem in the class of re- 
cursive languages describable by transformational grammars; see Peters and 
Ritchie (1973). 

Through the syntacticization of combinators, the CSC becomes a type 
constraint without variables, kept well inside recursive languages; in fact it 
is nearly context-free. Here is the combinatory solution to the problem, as 
worked out mainly by Gazdar (1988) and Steedman (2000b). The type con- 
straint is that the coordinands must be like-typed, enforced by the coordina- 
tor's lexical category (X\X)/X in (14).!! 

(14) a. The cat that |John admires s |j wp and |Mary hates ]s jp 
b. *The cat that |John admires] ¢ ip and [bites Mary |sv vp 
c. *The man that [admires John] S\NP and {Mary detests le HA 


d. The man |that admires John] NW and |(that) Mary detests | NN 
Steedman (2011: 94) 
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e. *The cat that |John admires s jp and [Mary hates le 
f. *The cat that |John admires it]; and [Mary hates ]s jp 


The similarity of the argument to the structure-dependent explanation, that 
coordinands must be like-categories in the structural sense, is illusory; it is 
the computation of this constraint that makes structure-dependent theories 
Turing-complete, and type-dependent ones (in the combinatory sense) nearly 
context-free. 


6. Constituency 


Combinators as semantic objects cannot be the explanation why we see lim- 
ited kinds of type dependencies in syntax. For example, we shall see that 
S can hardly be the explanation for the dependencies in Mary wanted to love, 
although they are certainly describable by S, because Sfga = fa(ga), thus 
S(Cwant")love'mary' = want'(love' mary )mary'. But this combinator is pre- 
cisely the syntactic explanation for the dependencies in He is the man I will 
persuade every friend of to vote for, and both reasons have to do with con- 
stituency as we shall later see. 

Some dependencies are nonexistent semantically and syntactically, al- 
though they are describable by the combinators that operate in syntax. For 
example, there is no language in which the pseudo-English expression John 
expects that Barry could mean ‘John expects Barry to expect’. Its seman- 
tics would be expect'john' (expect'barry'). It is describable by S, Cand T: 
S(CCjohn") (Tbarry expect,’ which is equivalent to the purported dependen- 
cies, expect’ (expect'barry')john'. It will turn out to be a conspiracy of syntac- 
tic types of nominals and verbs, therefore not a theoretical impossibility but 
lexical improbability. The coconstraining behavior of syntactic types and se- 
mantics is a major concern of the book for this reason. 

We need an agreed-upon definition of constituency to be able to judge the 
effects of semantic dependencies on syntactic grouping. 

I will follow an empirical notion of constituency, which is assumed to be 
the basis of competence: 


(15) Any surface string with compositional semantics that can be put to- 
gether phonologically by a native speaker is a constituent. 


As an empirical requirement, it says that whenever we observe an intona- 
tional grouping which is acceptable by native speakers, we must worry about 
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its compositional meaning, and about how to deliver that meaning. As a the- 
oretical requirement, it says no more than that every syntactic combination 
that mediates the phonology-semantics connection must have a semantic in- 
terpretation, otherwise we would just have a mixture of words rather than 
constituents, a point which Chomsky (1975: 206-211) was the first to point 
out back in 1955. 

This definition and its theoretical and empirical aspects seem to be shared 
by transformationalism and other frameworks as well. Consider for example 
Chomsky's criteria for phrase-markers, which embody constituency in his 
theory ever since its inception. 


(16) 1. The rule for conjunction Chomsky (1975: 210) 
2. Intrusion of parenthetical expressions 
3. Ability to enter transformations 
4. Certain intonational features. 


Chomsky goes on to argue in the next page that the first and the second 
criteria are actually theoretical, and can be subsumed by the third, but the 
fourth criterion is not. Therefore we are forced to have at least one theoretical 
and one empirical criterion for constituency, which is followed here as well. 

In a theory where structures are classified by subtyping, such as HPSG, 
constituency is directly built into the theory. Phrasal types are distinguished 
from lexical types by subtyping, with the further division of phrasal types 
as headed structures and others. Only the subtypes of the type phrase carry 
a feature called DAUGHTERS, subtyped as constituent structure (their con- 
struc), Pollard and Sag (1994: 31). Because all types have a semantic feature 
as well, it is incumbent on an HPSG grammar to show a head for the headed 
constituent structures, and no head for others, which establishes a good em- 
pirical test for constituency. 

The concept is manifest in multistructural theories of grammar such as 
LFG, as “order-free composition, requiring that the grammatical relations that 
the [grammatical] mapping derives from an arbitrary segment of a sentence 
be directly included in the grammatical relations that the mapping derives 
from the entire sentence, independently of operations on prior or subsequent 
segments," Bresnan and Kaplan (1982a: xliv). The nature of the mapping is 
the theoretical claim, and the inclusion of grammatical relations is the em- 
pirical test. LFG culminates the resolution of these multiple constraints on 
an independent level, called c(onstituent)-structure, with each level having its 
own well-formedness conditions. Their point extends to assigning a syntactic 
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mapping to the following fragments, just like complete sentences, precisely 
because the theory can show how their grammatical relations can be included 
in the set of interpretations of the larger segment of which they are a part: 


(17) a. There seemed to ... Bresnan and Kaplan (1982a: xlv) 
b. ...not told that... 
c. ...too difficult to attempt to... 
d. ...struck him as crazy... 
e. What did he... 


In summary, there seems to be a consensus that constituency must have 
a theoretical foothold and an empirical testing ground, without which it 
seems hard to formulate a grammar. Using a variableless, monostratal, order- 
instigated syntax for this task, which is presented here, and its way of han- 
dling constituency, naturally brings to mind comparisons to syntax with vari- 
ables, most notably with transformationalism, which as its name suggests 
needs variables. 

Consider the two different analyses of the man who Mary loved, shown be- 
low. (18) is an analysis based on Steedman's Combinatory Categorial Gram- 
mar (Ades and Steedman 1982, Steedman 2000b). 


(18) the man who Mary loved 
(SKSNNPJ/N N (N\N)/(Sin/NP) S /(S\NP3;) (Sin \NP)/NP 
S/NP D 
M\N S 
N < 
S/(S\NP) ` 


Figure 4 uses a recent version of transformationalism, the Minimalist Pro- 
gram, which started with Chomsky (1993, 1995). 

The analysis with variables, Figure 4, uses six primitives: move, merge, 
agree, check, lexical insertion, and argument structure. The last one ensures 
that we get a merge of loved and the syntactic variable -wh, rather than just 
loved, as in Mary loved deeply. Its scope is controlled by the governor +wh. 
Lexical insertion injects parts of words into the tree, and ensures for example 
that there is one copy of Mary. 

A structure-dependent but order-inspired theory of structure-building, that 
of Phillips (2003), appeals to order as its main thrust of the construction op- 
eration, and likewise uses several copies of words (first created then deleted 
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C move [Mary] 


E 


Mary agree | T, Mary] 


Mary agree |v,wh] 


Figure 4. Minimalist Program's primitives. 


under identity), plus the operations move, merge and the economy conditions 
on structures. It is not monotonically dependent on the syntactic types of the 
words in a sequence. 

The purpose of the book is to show that (18) uses only one primitive: 
Schónfinkel's juxtaposition. Every syntactic combination is local and adja- 
cent. It is meaning-bearing, and phonologically realized. For example, B's 
syntacticization arises from its dependency structure, written after a colon, 
which I use for the time being to talk informally about semantics. 


(19) X/Y: f Y/Z:g Z:a— X: f(ga) 
We could not conceive a B semantics if the syntactic types were one of 


the following in (20). Either adjacency (20a-c) or dependency (20d-e) are 
violated in these configurations. 
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(20) a. *X/Y Y Y/Z >X 
. *Y/Z XY Zo X 
. *X/Y Y/Z P/Q Z5 X P/O 
. *X/Y E 
. *X/Y YJW ZO X 


onan c 


Syntactic types adhere to dependency by virtue of adjacency as well. From 
the derivational configuration A B — C, shown on the left below, which 
means the syntactic types A and B given in this order leads to the syntactic 
type C, we can also obtain the same result by assuming A=C/B and B=C\A: 

(21) A B A=C/B B A C\A=B 
C C C 

For example, the English transitive construction ‘NP V NP’ spells a com- 
binatory type for the verb as follows: ‘NP V NP’ = S, hence ‘V NP’ 
— S\NP.'* Therefore ‘V’ — (SNNP)/NP. A phonological string œ with 
the morphological type ‘V?’ is known to syntax only by its syntactic type 
(SNNP)/ NP. We write this as: 


(22) a :=(S\NP)/NP 


Other translations are possible, for example (S/NP)\NP for 'V', but 
this category is easily eliminated by the litmus test of syntax, surface con- 
stituency: (23a) is grammatical, therefore its surface constituents must be 
derivable with the verbal category assumptions. 


(23) a. Obelix (chases relentlessly) and (eats ferociously) the wild boars of 
the Armorican forest. 
b. eats ferociously eats ferociously 


(S/NP)\NP35 (S\NP)\(S\NP) ` (S/NPANPs, (S/NP)\(S/NP) 
c. John fights ferociously. — ` 
NP3s S\NP3s (S/NP)\(S/NP) 


A category such as (S/NP)\NP for transitives would not be consistent or 
complete, because we must assume a consistent and complete category for 
the adverbial as well. Compare (23b) and (23c). The adverbial assumption in 
the first alternative of (23b) would be unworkable with the verbal assumption, 
as shown. The second alternative on the right is workable, but it would be in- 
sufficient for the constituency in (23c). The remaining potential culprit is the 
verbal assumption in (23), which must be revised. The categories (S\NP)/NP 
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and (S\NP)\(S\NP), respectively for the verb and the adverb, are consistent 
and complete with respect to the observations of constituency above. 

The argument structure arises from adjacency too. There is a systematic 
relation between a syntactic type such as (S\NP)/NP of love and its depen- 
dency representation Ax/ y.love'xy, which we can eta-normalize without vari- 
ables to love’ (eier), Similarly, the S WP of the intransitive Jove and its seman- 
tics Ax.love'x, which we can normalize to love’ (et), are codeterminant. 

A purported argument structure in the category (S\NP)/NP: Ax.love'x is 
universally disallowed, only because its eta-normalized version, love,’ which 
is variableless, could not give us a complete interpretation of the verb. There 
are two syntactic slashes, therefore two syntactic arguments, hence we must 
expect two lambdas (perhaps more, as in properties, but at least two, be- 
cause of the syntactic type). Although we can associate the variable x with 
the ‘/NP’, rightly or wrongly, there would be no semantic counterpart of 
‘\NP’ above, which is to say that we have no way of capturing its meaning 
because we would have no way of knowing what syntactic objects (words, 
phrases) it is argument of by virtue of adjacency. This cannot be the compe- 
tent knowledge of the word love, whether it is love’ (e (en or love’ wae 

Both syntax and semantics work by juxtaposition. Indeed, semantics be- 
comes immediately available at every step of the derivation because of having 
the same primitive. I redraw the derivation of (18) below to show the lock- 
step assembly of semantics driven entirely by syntactic types. 


(24) the man who Mary loved 


(S/(S\NP))/N N (N\N)/(S/NP) ` S/(S\NP35) (Sfin\NP)/NP 
: APAQ.(the'x)and'(Px)(Qx) : man! APAQAx.and'(Px)(Qx) ` AP.Pmary! : AxAy.loved!xy 


S/NP: Ay.loved'y may 
NN : AQAx.and' (loved'xmary')(Qx) 
N: Ax.and' (loved' x mary ) (man! x) 
S/(SNNP): AQ.(the'x)and' (and' (loved! x mary ) (man'x))(Qx) 


Notice that the process of lexical insertion into phrase-structural intermediate 
records (trees) is replaced by a process of bringing the self-contained type 
assignments of the meaning-bearing elements to the surface string. They can- 
not be copied, checked or governed, and there can be no late or early inser- 
tion. Such devices need structure-builders over and above order, as Phillips's 
(2003) work demonstrated. 

It is a prediction of a lexical insertionless theory such as CCG that mor- 
phological and phonological assembly interact with grammatical computa- 
tion in limited ways, to affect the syntactic types only at the interfaces. This 
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issue is related for example to Chomsky's “derivation by phase" (Chomsky 
2001). CCG’s conjecture is that a phase has a very limited window of op- 
portunity, namely one meaning-bearing item in the string, regulated by its 
lexical syntactic type (Steedman 2005b). This makes "phase" synonymous 
with ‘a lexical item that can be spotted in a string, one with a syntactic type 
and a predicate-argument dependency structure’. 

In this sense, the theory of CCG is not derivationalist in its account of 
constituency and interpretation, because no condition can be predicated over 
derivations if there aren’t any intermediate records to predicate over. Rep- 
resentationalism, which is a term commonly used in transformational stud- 
ies to show the contrast in their way of management of intermediate results, 
such as Brody (1995), Epstein et al. (1998), is not helpful to characterize 
CCG either. It can best be characterized as a type-dependent (rather than 
structure-dependent), radically lexicalist approach to syntax which relies on 
adjacency as the only structure building primitive, and only in places where 
structure truly manifests itself: surface constituency and predicate-argument 
structure. ? 

CCG’s principle of adjacency is not an argument of theoretical simplicity 
or Occam’s razor. Chomsky’s point on the topic of theory choice is well- 
taken: 


“Thus it is misleading to say that a better theory is one with a more limited 
conceptual structure, and that we prefer the minimal conceptual elaboration, 
the least theoretical apparatus. [..] If enrichment of theoretical apparatus and 
elaboration of conceptual structure will restrict the class of possible grammars 
and the class of sets of derivations generated by admissible grammars, then it 
will be a step forward (assuming it to be consistent with the requirement of de- 
scriptive adequacy).” Chomsky (1972: 68) 


The program of CCG is bringing semantics into the explanation in a com- 
pletely syntactic type-driven grammar and its computation. If semantics can 
reduce the possible lexical categories hence possible grammars, without fur- 
ther auxiliary assumptions, then its role in the explanation might be consid- 
ered a complication in the theory for a good reason. (It would be a complica- 
tion because the semantic representation is now part of the knowledge we can 
collectively call a category, together with the syntactic type.) If a significant 
reduction can be shown, then a narrower theory is to be preferred. However, 
doing this the CCG way shifts the goals of linguistic theorizing from narrow- 
ing down the admissible phrase markers to understanding the limited nature 
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of dependency and constituency despite the apparent flexibility in order and 
structure. Hence the question is more complex than presented so far. 

The insistence on adjacency distinguishes CCG from theories which are 
otherwise similar in spirit in adopting lexicalism and the abandonment of 
transformations. For example, HPSG had in the past posited empty strings in 
the lexicon for topicalization and relativization (Pollard and Sag 1987), then 
moved towards the elimination of traces (Pollard and Sag 1994). LFG has 
this option too; cf. Kaplan and Bresnan (1995), Kaplan and Zaenen (1995). 
Type-logical grammar can assign types to empty strings and retract such as- 
sumptions under certain conditions, or stay away from this practice as it sees 
fit regarding semantics, e.g. Carpenter (1997).!6 CCG has no such degree of 
freedom. 

The notion of possible grammars can be equated with possible combina- 
tory categories when we insist on adjacency and radical lexicalization be- 
cause only lexical items can bear categories and the categories contain no 
variables. Combinatory constituency is the litmus test for such categories. A 
related cousin of juxtaposition called “wrap” does not provide a combinatory 
base, as we shall see in §5.1. 


Chapter 3 
The lexicon, argumenthood and combinators 


Let us now see how combinators can capture function-argument configura- 
tions as a consequence of juxtaposition, and without variables. This will give 
us a variableless lexicon. Then we move on to variableless syntax. First, some 
history of the variable. 

Peirce's (1870) elimination of variables predates Frege's decisive work 
on clarifying the notion of variable, and Peirce was apparently unaware of 
Frege's work. Frege's (1891) variableless technique was to represent for ex- 
ample x? +x as ( )? -- ( ). The notation, as he prophesized, did “not meet 
with any acceptance" (Frege 1904:p.114). His currying in Frege (1893) is al- 
most identical to what we have now, due to its adoption by Church (1936) 
for lambda calculus. Frege's program aims to distinguish intensions such as 
(  - () from extensions (values) such as Ax.x? +x. 

The two notations put together did not lend themselves to purely 
adjacency-driven models of semantic object manipulation. Schónfinkel had 
to appeal to Lukasiewicz-style prefix notation to facilitate variableless com- 
bination by adjacency. 

However, he did not use Lukasiewicz’s (1929) prefix operator—which 
Quine 1967 symbolized as o, to represent x(yz) as oxoyz. He used the paren- 
thesized notation instead. Thus Quine (1967) is right to criticize Behmann for 
adding the end material to the 1924 paper about the elimination of parenthe- 
ses, which Schónfinkel apparently had not intended as his agenda. 

It is sometimes useful to make a clarification about the whole practice 
of variable elimination. As Curry pointed out frequently (Curry 1929, 1963, 
Curry and Feys 1958), combinatory logic concerns itself with the elimination 
of variables from elementary theorems, but leaves open the question of their 
utility in epitheorems. Thus a foundation is set in which we can safely assume 
that bound variables, if used, are used only for expository or efficiency pur- 
poses (because of Church-Turing thesis and the equivalence of lambda calculi 
and combinators—see Barendregt 1984). Steedman (1988, 19962) suggests 
that bounded constructions (passives, reflexives etc.) are one area in which a 
variable-friendly logical form in an otherwise variableless combinatory syn- 
tax might have evolutionarily arisen out of pressures for efficient processing. 
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1. Adjacency and arity 


We can now move toward a variableless lexicon in Curry's sense of elimi- 
nating them from fundamental theorems. An n-argument predicate f can be 
uniquely represented as f" if we wished. However, the arity declaration of an 
object is an intrinsically combinatory property of it, therefore the f" notation 
would not do to establish the lexicon-syntax communication by order alone. 
Curry and Feys's (1958) definition of power for combinatory objects reveals 
the right combinatory source. We can define the arity of f as a consequence 
of juxtaposition. It marks the arity of f as a combinatory prefix. 


def J f forn —0 
UD Agen) e { B'"lf forn>0 


Some manifestations of combinatory arity are exemplified below. 


(2) fabcde... (f?) 
B!Ifabcde... = Mfa)bcde... = (fa)bcde... (fD) 
B?Ifabcde... = BBBI fabcde ... = Mfab)cde... = (fab)cde... (f?) 


Because of l, every abstraction is necessarily a function if there are argu- 
ments. This is implicit in Schönfinkel’s notation §1(1). 1 

The notation translates to syntactic argument-taking directly; the power 
of B in (1) is the number of slashes of f in its syntactic category. For BĉIf, 
we get for example A/B/C/D for f where A is the result type of f, but not 
A/(B/ C)/D, because the second slash in the latter category would be for the 
argument of B, not A. Similarly, if f is a zero-argument function (a constant), 
then BIL or If would not faithfully reflect that it is not necessarily a functor; 
it can be say A rather than A/B, hence the first clause of (1). 

The reason for going through the trouble of variable-free argument specifi- 
cation is to show that argument taking is just another manifestation of seman- 
tic dependency, and to show that the adjacency formulation of dependency 
finds a natural niche for it in syntax without being orthogonal to, or an aux- 
iliary assumption of, phrase structure. All of the combinators' behavior is 
describable solely by the adjacency of functions and arguments. S, B and | 
etc. can take their arguments only if they are adjacent. The results are pre- 
dictable directly from their adjacency. The dotted material in for example 
Bfab---d--- is “unreachable” to B, therefore uninterpretable by this B. The 
object d cannot be an argument of this B, by the virtue of its nonadjacency. 
The objects f, a and b must be the arguments of B because of their adjacency. 


(Schónfinkel-Curry arity) 
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The combinators B, S, I etc. are assumed to contain no vacuous abstrac- 
tion in their definition, i.e. all and only the arguments are specified. Hence 
there is no version of B which is able to reach out and take the d above via 
vacuous abstraction, say Ax) - -+ Axn-x1(x2x4) for some n > 3. This is not a 
theoretical necessity because by definition any object is a combinator if it has 
no free variables, including the ones with spurious abstractions. It is in this 
sense that we take them as “building blocks’ as Schónfinkel had called them; 
all other combinatory definitions are illative.'® 

Now a single grammatical base, adjacency, explains all behaviors of 
argument-taking objects because we know from Curry and Feys (1958) that 
combinators have the same power as the lambda calculus. (This is somewhat 
tolerable in computing, but it presents problems to a linguistic theory. I say 
more on this in the closing words of this chapter.) 

The effect of unification of argument-specification and combinatory be- 
havior under adjacency might be quite revealing for the radical lexicalization 
of natural grammars. Most importantly, we get full interpretability of words 
and phrases, which is what argument specification is all about, for free. This 
result arises from supercombination, along with finite typeability in the lexi- 
con. 


2. Words, supercombinators and subcombinators 


For the purpose of understanding linguistic argument-taking by combinators, 
the relation between combinatory terms and lambda terms requires a closer 
look. For example, f (Ax.g(hx)) indicates that x is not an argument of f but 
of h. If fis the head functor of a word w, then this lambda term suggests 
that x is not an argument of w but of some other word which f takes in its 
domain. An example of such dependency is the bracketed substring in | what 
you can] and what you must not count on. 

Assuming AQ.?yQy for the semantics of what for simplicity, following 
Hoyt and Baldridge (2008), Groenendijk and Stokhof (1997), the substring 
encodes the dependency in (3a), but not (3b). 


(3) a. what you can := AP-?y can' (Py you) 
b. *APAx.?ycan' (Px you )y 
In other words, what you can is a one-argument function, not two. The vari- 
able y is a nonsyntactic argument of P. (P in this case corresponds to count 
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on, whose predicate-argument structure Axı A.x2.con'x,x2 is opaque to what 
and what you can.) The difference arises from the nature of combinators and 
supercombinators. 

All the combinators we have seen so far are supercombinators, with the 
exception of O and Y, to be defined below. Supercombinators can group their 
argument abstractions—lambdas—to the left to leave a lambdaless body. This 
seems to be a clear identification of a predicate-argument structure in a cat- 
egory, where the lambdas can be seen as the glue language for syntactic ar- 
guments. We will have a closer look at Y later because it is crucial for the 
debate on syntactic versus semantic recursion. 


The opaqueness in what you can arises from O. In combinatory parlance, 


this combinator is not a supercombinator : Of gh def f(Ax.g(hx)). The se- 


mantics of what you can requires this combinator: Owhat'(you'can'), where 
what'-À Q.?yQy. A preview of CCG's syntactic type-driven way of handling 
this dependency is given below along with its semantic assembly (4). It makes 
use of the syntacticized O rule in (5). I will justify the syntactic types of (5) 
in the next chapter. 


(4) what you can 


S/(S/NP) S/(SNNP) (S\NP)/(S\NP) 
: AQ-?yQy ` A f.f you : APAx.can' (Px) 
S/(S\NP) 
: AP.can' (Pyou') 


S/((SNNP)/NP) 
: AP?ycan' (P y you) 


(5) a. X/(Y/Z): f Y/W: g W/Z: h 5 X: f(Ax.g(hx)) (O) 
b. For some X/Y: h (Y) 
X/Y:h eo Jy -X/(XI/Y):Y h 
(X/Y).Z, 1 & Fn forn>0 


The significance of supercombinators for our purposes is the following. 


(i) Inner lambdas cannot be the arguments of f in (52a), hence the ternary 
nature of O, although there is a lambda left on the right-hand side of 
its definition. Y is considered unary for the same reason. 


Therefore argumenthood is not a simple count of lambdas. It is a struc- 
tural property because it requires the knowledge of inclusion asymme- 


(i) 


(iii) 


v) 


(v) 
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tries. This is one of the reasons why we need the notion of predicate- 
argument structure in addition to dependency, leading to PADS. 


Words whose semantics require combinators which are not supercom- 
binators can be called subcombinators. Although they may look odd as 
words, such as what you can, with O semantics as shown above, they 
can in principle be lexicalized. For example, the Turkish equivalent of 
that I defended is indeed one word, savunduğum, which also has O 
semantics as we shall later see. They necessarily absorb an argument 
of their arguments because of an inner lambda abstraction, as in x of 


Ofgh = f(Ax.g(hx)). 
Not all subcombinators are finitely typeable. O has finitely many types, 
but Y does not (5); notice the recurrence relation in Y. Finite typeabil- 


ity seems to be a prerequisite for compositional semantics of words 
because it translates to lexical representability. 


The words with subcombinator semantics must be distinguished from 
function words whose arguments may be opaque in a different way. 
For example, in languages where unbounded relativization is headed 
by a relative pronoun, such as that in English, we have the semantics 
ÀPAQAx.and' (Px)(Qx) for the relative marker. Here opaqueness arises 
from the fact that x substitutes for a property in Q and a participant in P, 
cf. the dog that the cat chased versus *Fido that the cat chased. 'There 
are no inner syntactic lambdas in APA QA x.and'(Px)(Qx); it is indeed 
à supercombinator. 


Words whose semantics demand a lexical use of combinators such as 
Y would be very odd. If there were such words, their combinatory be- 
havior could not be read off entirely from their argument types because 
the syntactic contexts in which they can occur cannot be known fully 
by the native speaker; note the recursive variable F in (5) above. It is 
tantamount to saying that knowledge of these words cannot be com- 
plete. We can conjecture that no such word exists in natural languages. 
Therefore, 


The only lexicalizable dependencies that are manifest in natural lan- 
guage are the finitely typeable ones. They are describable by super- 
combinators and subcombinators. This is a necessary but not sufficient 
condition. We shall see examples such as the combinator K. It is a 
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finitely typeable supercombinator which is very unlikely to be operat- 
ing in syntax or in the lexicon. 


3. Infinitude and learnability-in-principle 


Clearly, a subset of combinators ought to be considered as potential combina- 
tory apparatus for a linguistic theory. The dependencies manifested by Y and 
K have not been attested in natural languages, and C might wreak havoc in 
grammar but perhaps not in the lexicon. CCG has a specific answer to this 
problem, which I summarize in Chapter 5. 

In a way linguistics faces the same amount of problems meeting the com- 
binators when physics faced against Roger Penrose's claim that classical 
physics is Turing-computable: none.? It did not make the Turing machine 
a rival theory of classical physics, because it cannot predict anything unless 
physicists engage substantive constraints in their theory. Similarly, combina- 
tors cannot be a theory of language just because they happen to be the models 
of adjacency par excellence. This is where the linguistic theorizing begins for 
combinators. 

Three issues arise for any linguistic theory aspiring for formal adequacy 
and substantive restrictiveness: infinity, decidability and representability of 
natural language. The combinatory perspective suggests that, although all 
three issues are crucial, representability is the most decisive among the three, 
and it is not some informal notion of representability, but Turing repre- 
sentability. The reasons are as follows. 

The argument for the infinitude of human languages first appealed to 
Cartesian creativity and von Humboldtian romanticism, respectively: (a) 
there is a universal repertoire of thoughts with infinite ways to express them, 
and (b) individual languages materialize as the special manifestations of a 
universal human language. Chomsky's (1966) integration of these two lines 
of thought as the cornerstones of his generative grammar “of infinite use of 
finite means" carried the finiteness debate into the realm of formal methods. 

Generative grammar attempted to enumerate possible grammars, but the 
earlier attempts were overshots. Putnam (1961) criticized the basic innova- 
tion of generative grammar, transformations, as being able to generate nonre- 
cursive languages, and maintaining that human languages are recursive. Put- 
nam's claim had been criticized to be too performance-oriented, but Peters 
and Ritchie (1973) argued from the perspective of competence grammars, 
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and could not find a nonarbitrary way of delimiting possible transformational 
grammars to guarantee a constrained formalism. 

Chomsky's theorizing shifted away from formal aspects by the early 60s, 
and the debate on the undecidability of his formalism faded.”° He claimed 
that recursion is the basic trait of human language, for example Chomsky 
(2000), Hauser, Chomsky and Fitch (2002). The notion of recursion is most 
formally dealt with in mathematics and computing science, and the results 
I summarize in $4.1 and $9.2 suggest that what Chomsky seems to have in 
mind is everybody's assumption, that semantic recursion, i.e. recursion by 
value, is real for all humans. Syntactic recursion, however, i.e. recursion by a 
name or a label, is not necessary for this, and the lack of a Y-like behavior in 
any natural language can be taken as the living proof of this result. Y is the 
paradoxical combinator of Curry, and without it or its behavioral equivalent 
such as Turing's U, syntactic recursion is not possible, as we shall see in $4.1. 

Pullum and Scholz (2009) argue that giving up on recursion is not a mental 
block to creativity. After all, 10??? might be the number of possible sentences 
in human languages, and it does require a theory to sift through the search 
space to identify say English, even though the search space is finite. 

I am of course not suggesting that we take the easiest way out to sat- 
isfy Gold's (1967) finding about learnability, by assuming that languages are 
learnable because they are finite. In his "text" model where the acquirer faces 
the same conditions as the child, only finite languages can be learned. In the 
other model, called the "informant", any grammar up to and including that of 
primitive recursive languages can be learned.?! The model requires a decider 
to answer whether a string is in the language or not. Gold himself acknowl- 
edges that it requires feedback about negative instances “by being corrected 
in a way we do not recognize"Gold (1967: 453). 

The computationalist scenarios I outline in $9.5 suggest that there is prob- 
ably more indirect evidence than what is assumed by the complex innate 
knowledge proposals. For example there is the possibility of the child be- 
ing wrong about what an utterance means, but being very explicit about the 
syntax-semantics connection hypothesis, for example thinking that veggies 
means dog’ when the word is uttered when there is a dog around, or that veg- 
gies is an act like eating, with a syntactic type such as S/NP rather than NP as 
the adult might have intended. The indirect evidence here might be the next 
state of affairs where there are veggies but no dogs around, or no potential 
for being forced to eat them, such as being pointed in a grocery display while 
sitting in a stroller. Infinitude seems to be a secondary concern in this task. 
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But learning “something more than the data" in the Humean sense does prove 
critical; see $9.5 for discussion, where something more is claimed to be the 
syntactic type. 

Without too much of a worry about finitude, we can readjust the goals of 
linguistic theory to understand why we do not see some kinds of dependen- 
cies and constituencies in any language, whether they are finite or not. Free 
operation in syntax, and the codetermination of syntax and semantics in the 
form of a category, seem to suffice for this line of research. 

Now let us consider decidability. A weak argument arises from formal 
aspects, such as transformationalism not being able to deliver grammars 
that always decide. We do not know whether this is the reason why Chom- 
sky (1965) entertains the possibility of natural languages being potentially 
undecidable.? One formalization of minimalist grammars, that of Stabler 
(1997, 1999), suggests that Chomsky’s recent grammars stay well within re- 
cursive languages. 

A stronger argument is from languages rather than grammars. A naive ver- 
sion of the argument might proceed as follows: human languages are decid- 
able because every speaker can decide whether any expression is a sentence 
in her language. Differences of opinion would not count because the speak- 
ers would have to make up their minds in the first place to be able to agree or 
disagree. 

What makes them decidable is a meta-theoretical question, but it would 
not lead to a theory of language if it fails to engage substantive constraints 
in a linguistic theory. Levelt (1974) suggests that one such constraint is the 
learnability-in-principle, which amounts to saying that acquirable grammars 
are the primitive recursive ones. This is one of the running themes of this 
book, and it requires a closer look at substantive constraints on grammars, 
which we will narrow down to a theory of possible lexical categories. We 
know that a concocted language in which every sentence has an even number 
of words is decidable, yet there is no such language and we can be certain 
that there will never be. So what is unnatural about this language? Clearly, 
no amount of formalization can give us the desired answer, because the very 
word natural requires that we situate the formal apparatus in some complex 
system with interactions, i.e. a system with substantive constraints. 

We can also entertain the possibility that human languages may be 
Turing-undecidable but Putnam-Gold decidable. Putnam-Gold (1965) ma- 
chines are Turing machines that can change their minds—if you pardon 
the expression—as the computation develops. Thus, for a known Turing- 
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undecidable problem, a Putnam-Gold machine can output a “no” before com- 
putation begins, and then output a “yes” or another “no” depending on which 
state it halts. If it never does, we still have an answer.” It does not follow 
that any undecidable problem can be modeled that way. Take for example the 
question: 


(6) What is the next real number after n? 


The argument of decidability for language must remind us that language is 
not posing that kind of a question to us, even though the question may be very 
relevant to the semantics out there, where meaning cannot be determined by 
language. 

This brings us to the final issue, that of finite versus transfinite rep- 
resentability. There is another argument of undecidability that we should 
take into account in this regard, that of Hintikka (1977). He uses the se- 
mantic criterion of synonymy, of interchangeability of any-expressions with 
every-expressions in English, which he shows to be not even recursively- 
enumerable. Bresnan and Kaplan (1982a: xliv) comment that “If Hintikka's 
argument is correct, then semantics must diverge from syntax in a fundamen- 
tal way, as he observes" Remember also Quine's (1951) warning that the 
notion of synonymy brings with it other problematic concepts such as analyt- 
icity. 

We could also speculate whether the problem as stated by Hintikka is 
Turing-representable in the first place. Bolinger (1968: 234) offers another 
linguistic perspective in this regard, which suggests that we might start with 
questioning Hintikka's experiment and its implications for the nature of se- 
mantic representation: “Practically speaking, there is no such thing as an 
identical synonym. The language demands its money's worth from every 
word it permits to survive." 

Where does synonymy stop, if it exists? (Note that in the work cited above 
and in the follow-up Hintikka 1980, the test requires any-substituted sentence 
to be grammatical, and contrast in meaning.) In this continuous space of sim- 
ilarity, we can also include problems that are not Turing-representable. For 
example, if and at what level can we say that a cat is sitting on the mat? At 
the folk science level or ordinary language, with some nominal understanding 
of sitting, we can test this hypothesis, but at the quantum level? Some of the 
quantas of the cat might be communicating with the mat to a level we might 
consider touching, but surely not all of them. How that experiment differs 
from synonymy experiment is not clear. (See Higginbotham 1982 for another 
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kind of objection, that taking logical equivalence as a sufficient condition for 
sameness of meaning is problematic. Quine's demonstration of circularity 
of analyticity and synonymy presents a conundrum for semantic criteria as 
well.) 

Thus any criterion of decidability ought to be syntactic and combinatorial, 
otherwise we are in a domain much like the real numbers, and we can forget 
about a combinatorial base for language. 

The moral of the thought experiment is that it pays to keep the prob- 
lem combinatorial by sticking to a syntactic criterion of (un)decidability. We 
know some realizable classes of formal machines to see what kind of compu- 
tational resource management we need to capture the kinds of dependencies 
we see in natural languages. We have no such hope as yet for transfinite repre- 
sentations which are implicit in (6). The notion of representability is dubious 
in that domain. 

In this context, limited noncontext-freeness of human languages formally 
argued for by Shieber (1985) and Joshi (1985) provides a research agenda 
in which the limited nature of the automaton itself is the explanation for the 
limited kinds of dependencies, rather than extra assumptions or stipulations. 
This also cuts down severely the degrees of freedom in theorizing because 
limited computational resources can be called in for help in a hypothesis. In 
this way of thinking going in the syntactic route all the way to undecidability 
would not change the underlying syntactic machinery, it would just mean that 
the source of undecidability might be the lexicon, such as a word with Y or 
WWW semantics. 

Thus, Turing representability in the abstract is the key to be able to even 
talk about the syntactic manifestation of semantic dependencies. Words with 
syntactic dependencies are the observables on which we can theorize about 
semantics. Decidability and finiteness are secondary issues. 

That of course does not entail that the biological substrate of the limited 
automaton is the answer to our combinatorial problems. There is a very likely 
possibility that the (human) brain is not a sequential computer like the Turing 
machine. For all we know, the underlying cognitive mechanism for language 
may not be language-specific at all. And this is where the linguistic theo- 
rizing stops, in case the warnings of Sandra (1998) about what linguists can 
and cannot say about human language processing mechanisms are not clear 
enough, with our current level of understanding. Remember the debate in 
the 1990s about the psychological reality of traces and empty categories. For 
every experiment which proved the reality of such elements (see Zurif 1995, 
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Gibson and Hickok 1993), there was a counter-experiment which proved their 
nonexistence (e.g. Pickering and Barry 1991, Pickering 1993). 

This takes us back to variables in theorizing. Traces and empty categories 
are syntactic variables in need of binding or government. Why eliminate them 
when they are so convenient to our understanding of argument-taking? Com- 
puting scientists face the same predicament for different reasons. The com- 
puting story is quite revealing, but I leave it to programming language theo- 
rists to tell that story.” 

In this book I will stick to the linguistic story. One of the most striking 
empirical observations of the 20th century linguistics is that parsing is a re- 
flex. (Try turning it off if you are a skeptic, and imagine someone saying the 
ineffable as you try to shut yourself down.)?? It is tempting to say that we 
could import computing's success with variableless interpretation to account 
for the reflex-like behavior of knowledge of language in action (the key word 
here is /ike, because the metaphor seems to fail in predictable ways in for 
example aphasia and autism). 

A less speculative answer is that the kind of combinatorics that is re- 
vealed before us in the form of syntacticized combinators sets up a base 
on which substantive theories can be built to predict possible linguistic cat- 
egories, therefore possible languages. The adjacency base of semantics di- 
rectly translates to adjacency syntax when we eliminate variables from fun- 
damental theorems. 

The interesting turn of variableless theorizing with combinators is that 
not only do they suggest a formal source for the combinatory possibilities 
in languages, they make the combinations—constituents— directly and im- 
mediately interpretable if the ingredients happen to have semantics. That is 
the bread and butter of a competence grammar, and we get a modeling tool in 
which syntax and semantics coconstrain possible lexical categories to provide 
a substantive base. 

And everything does have semantics, including the so-called dummies (for 
example the it in It seems to rain), the accusative case and function words 
such as that, to etc., once we readjust our semantic radar.?Ó The purpose of 
the book is to show an attempt of that model building process in detail. 


Chapter 4 
Syntacticizing the combinators 


The combinators were originally intended to deal with functions. For them to 
do syntactic-semantic—i.e. grammatical —work, we need their faithful trans- 
lation into syntactic objects so that the semantic dependencies they symbol- 
ize are directly imported into syntactic dependencies. This is what I mean by 
syntacticizing the combinators.?" 

The reader might object that what I call “functions” are syntactic objects, 
because lambda calculus and variableless combinators seem to manipulate 
them by syntactic rules. 

They may be called syntactic objects of a domain theory, i.e. a name for 
collection of objects, but they would not be the syntactic objects of a linguis- 
tic theory. Consider the same problem (levels of abstraction) for the theory of 
lambda calculus. It has a direct denotational semantics for any lambda expres- 
sion, for example x denotes all values of x in an environment e, Ax.M denotes 
all values denoted by M when the free occurrences of x in M gets some value, 
say a. Lambda terms are its syntactic objects, and sets-as-denotations are its 
semantic objects. (See Barendregt 1984, Stoy 1981 for a full treatment of 
denotation and its relation to the syntax of lambda calculus.) 

We face the same levels of abstraction problem in combinatory linguis- 
tics. Although a compositional meaning of the phrase /ove hurts could be 
given as BAhurt'love' if we wished, this must arise from words as syntactic 
objects, since we cannot communicate combinatory thoughts as combinatory 
thoughts. (If you are not convinced, try conveying the meaning of love hurts 
without words, in a medium in which you must also be able to convey the 
meanings of: I believe love hurts. Mary claims I believe love hurts. The man 
in the corner claims Mary thinks I believe love hurts. etc.) 

This brings us to the ontology of objects in a linguistic theory. CCG’s han- 
dling of dependency is different from that of dependency grammars, where it 
is taken as an asymmetric relation among words (syntactic objects) in a string. 
In CCG, the dependency relation is defined over semantic objects, but since 
the observables are syntactic objects, the relation must be mediated by syntac- 
tic types. This might be considered a complication in the theory in Chomsky’s 
sense noted earlier, but it is for a good reason: it can give us predictions about 
surface constituents and their immediate interpretability. 
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We first syntacticize application. The slash */" is the syntactic counterpart 
of function application, which is made explicit in Schónfinkel-Curry arity 
§3(1), where the power of B translates to the number of slashes for arguments. 
We write B!I f as A/B: f. The syntactic type of f states that it is syntactically 
a function from B to A. 

We can now syntacticize the semantic dependency manifested by juxtapo- 
sition fa: 


(D X/Y:f Y:a>X: fa (application) 


*—? is the syntactic counterpart of the reduction rule, viz. beta-conversion. 
There is no restriction that Y be slashed or slashless. This follows from the 
semantics of application, which is (fa) but not necessarily f (la). 

We write (2) syntactically to mean that the syntactic objects on and 0», 
with categories A/B: f and B: a, capture the semantic dependency fa in their 
syntactic types. 


2) oy o: 
AJB B 

A 
Argument-taking objects such as f above are curried functions. Thus ev- 
ery such f takes one argument at a time. Its syntactic type cannot be slashless 


because, if it could, we could write application as (3) as well (a **' in a rule 
decoration indicates ill-formedness). 


(3) X:f Y:a— X: fa (*application) 


There is nothing in the ingredients of the rule (3) that says f is the function 
and a is the argument, yet the result requires it. The rule is not compositional 
as it stands. The X/Y type for B'If forces a function interpretation on the 
syntactic side as well, hence the rule (1). 

The syntactic type of B”If has n slashes as in X/, --- /, Y. The last slash 
is the one relevant to (1), because the left-associativity of juxtaposition nat- 
urally translates to the left-associativity of the slash.?? X/;---/,Y is same as 
(E ec) 

The application rule cannot be (4a) either because the semantic depen- 
dency is fa, not af. (4b) fails to capture the dependency of f's argument 
type and a. Z cannot be an arbitrary argument type; it must be Y. 


(4) a. Y:a X/Y:f —^5X:fa (*application) 
b. X/Y:f Z:a>X: fa (*application) 
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Thus the only syntacticized rule of application that translates the semantic 
dependencies to syntactic dependencies without further assumption is (1). We 
can write (1) as (5) because of this result, and fully syntacticize it. 


(5) X/Y YoX (application) 


Table 1 lists all the combinators which Curry and Feys (1958) considered 
more or less basic. Smullyan (1985) retold the story of combinators as talking 
birds, presumably anticipating their natural fit with language.?? The names 
in the third column are Smullyan's birds. We shall syntacticize them—and 
more—one by one. 


Table 1. Basic combinators 


I lx=x Identity bird 

Y Yx=y=xy forsomey Sage bird 
depending on x 

K Kx=x Kestrel 

T Txy=yx Thrush 

W Wfx-fxx Warbler 

B Bxyz=x(yz) Bluebird 

C  Cxyz —xzy Cardinal 

S Sxyz = xz(yz) Starling 

®  Oxuyzw = x(yw)(zw) 

WW Waxyzw = x(yz)(yw) 

J  Jxyzw = xy(xwz) Jay 


1. Unary combinators 


The first unary combinator is I. We can syntacticize it as (6). Unary rules are 
simple correspondences without combination, which we write with a double 
arrow. 


(6) X/Y :a & X/Y :la (D) 
At first sight | might look superfluous because it adds nothing to the inven- 


tory of semantic objects or syntactic types. It does crucial work on the lexical 
side when we want to ensure that an argument of an object is an argument- 
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taking object itself. On the syntactic type, such constraints translate to requir- 
ing a slashed category. For example, f can be typed A/(B/C)/(D/E) if both 
arguments are unsaturated functions (remember that currying will take care 
of the arity of B and D). Thus the following purported syntacticization of | 
does not import the semantic property that whatever a is, la is necessarily a 
syntactic and semantic function. 


(7) X:a 4 X :la ED 


The other unary combinator, Y, which was discovered by Curry, is the 
epitome of recursion, and rightfully established him as the father of func- 
tional programming by the 1970s.?? For example, YK deletes infinitely many 
objects. Curry and Feys (1958) called it the paradoxical combinator because 
it captures Russell's paradox nicely. It is better known as the fixpoint com- 
binator, which allows recursive programs to be written without variables or 
names. Recall that Y behaves the following way: Y h — h(Y h). 

Not surprisingly, Y's syntacticization fares no better than infinite regress 
in semantics, and leads to an infinite schema: 


(8) For some X/Y: h (Y) 
X/Y:h e 4-X/(X/Y):Y h 
(X/Y)/Fn1 € Fa forn>0 


What makes the syntacticized Y syntactically recursive is the recurrence 
relation .Z;, not having the same result as its argument, as in X/(X/Y). This 
observation will be crucial in the following chapters. 

We can see the syntactically recursive behavior of Y in (9), for a hypothet- 
ical word o. 


(9) o 
X/Y:h 
Fo =X/(X/Y): Yh 
Fa = (X/Y)/ Fo = (X/Y)/(X/(X/Y)) : h (Y h) 
Fy = (X/Y) Fa = (X/Y)/(X/Y)/(I(X[Y)): h (h (Y h)) 
Fs = (XIY Pa = X TOKA TOKA TOX XTX): h h h (Y A 
The property that saves the infinite expansion from unwarranted undecid- 


ability is what computing scientists call lazy evaluation, which is to avoid 
evaluating an argument in normal-order until it is demanded by its function. 
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It is a consequence of the Church-Rosser (1936) theorems. No one has iden- 
tified a word in any language that requires the second derivation line above. 
Thus, although Y can be kept under control by lazy evaluation, no such de- 
pendency seems manifest in languages. 


2. Binary combinators 


Let us now consider Schónfinkel's binary combinators T and K. T can be 
syntacticized as (10). By T's semantics, viz. Tab — ba, we know that b is the 
function and a is the argument. 


(10 Y:a X/Y:b X: Tab (T) 


We cannot have (11) as the syntactic reflexes of T. The overall syntactic 
type is that of b, viz. X, which is not guaranteed in (11a). (11b) fails to capture 
T semantics because a # Ta. T wants the function after the argument. 


(11) a. Y:a X/Y:b —Z: Tab (*T) 
b. X/Y: b Y:a —^ X/(X/Y): Ta X/Y:b (*T) 


T's syntacticization is completed once the semantic dependencies are di- 
rectly reflected in the syntactic types. We can rewrite (10) without semantic 
objects from now on: 


(12) Y X/Y>X QT-T) 


We can carry over the X/Y of (12) to the right to fully syntacticize the 
unary version of T: 


(13) Y e X/(X/Y) QT) 


What allows us to do this is the asymmetry of juxtaposition inherent in 
Schónfinkel's interpretation, that the sequence ab is not the same as the se- 
quence ba, thus Y X/Y is not the same as X/Y Y. Therefore, carrying over 
the Y in (12) to the right, for example as X/Y: b — X\Y: b, would be wrong, 
whereas X/Y: b — X\Y: Ax.bx is fine?! (The backslash attempts to keep the 
relative order of X/Y and Y.) The equivalence of the first case would imply 
ab — ba necessarily. The relation must be mediated, and Tab — ba is a way 
of doing that. 

We can see the effect of importing the mediation to syntactic types in the 
following examples: (14a-b) embody T semantics, whereas (14c) does not. 
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(14) a. 0 @ b. Qj 0» C. (00 0» 
Y X/Y Y X/Y Y XY 
~y T —T — T 

X/(X/Y) 
EE e 


However, there is a systematic relation between the forced T semantics of 
the kind in (14a—b), and optional T semantics in (14c). This is shown in (15). 
From this perspective, T can be seen as the application of an argument as a 
function in one direction, to a function which looks for an argument of that 
kind in the other direction. 


(15) Qi o: 
Y:a X\Y: f 
X/(X\Y): Ta 
X:Taf=fa 
It is called type raising for this reason, which necessarily involves applica- 
tive configurations:** 
(16) XY Y > X Y o XNXJ/Y) (type raising) 
Y XY > X Y © X/(XW) 


The process is order-preserving, and relaxing this property results in per- 
mutation closure (Moortgat 1988a). The optionality of T proves to be a nec- 
essary degree of freedom in the account of flexible constituency, as we shall 
see in Chapter 5. 

K's syntacticization is straightforward because it does not follow from a 
semantic dependency between its arguments: 


(17) X:a Y:bX:Kab-a (K) 


K's power of deletion is unmatched by any of the combinators in Table 1, 
therefore it is not interdefinable by these combinators or juxtaposition. Its 
unary version serves to show its formidable powers, by freely deleting the 
syntactic dependencies of any Y: 


(18) X: a & X/Y: Ka = Ab.a GK) 
The last binary combinator in Table 1 is W. With semantics W fa = faa, 


it can behave incessantly like Y in certain circumstances such as WWW. By 
definition, f requires two arguments. We can syntacticize it as follows: 
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(19) (X/Y)/Y: f. Y:a>X:Wfa (W) 


It would be wrong to syntacticize it as below. (20a) would turn a one- 
argument f into a two-argument f. (20b) would not be compositional: there 
is no indication that the second argument—Y—is reduced to the first argu- 
ment Z, hence the semantic dependency of W is not wholly reflected in the 
syntactic types. 


(20) a. X/Y: f Y:a X/Y/Y: f Y:a Y:a (*W) 
b. (X/Y)/Z: f Zia X: Wfa (*W) 


Carrying over the Y from the left-hand side of (19) to the right-hand side, 
and writing the remainder as W capture the semantics of W (21a), which we 
can fully syntacticize as in (21b). 


Q1) a. (X/Y)/Y: f ^ X/Y: Wf = àa. faa 
b. (X/Y)/Y & X/Y QW) 


The reader will note the lavish use of resources by W, and wasteful K. 
When applied to semantic objects, say K fa to waste a, or W fa to bring an- 
other a out of a hat, this may look tolerable. But when the objects in question 
are syntactic objects, namely words, resource insensitivity takes on a whole 
new meaning. We shall see in subsequent chapters that resource sensitivity 
does not necessarily follow from adjacency (witness K), therefore exclusion 
of W or K from syntax must be scrutinized, rather than assumed because of 
their resource insensitivity. 


3. Ternary combinators 


We now turn to combinators with three arguments. The one with the sim- 
plest semantics is B the compositor, which embodies the composition of two 
functions: Bfga = f(ga). We can syntacticize it as follows: 


Q2) X/Y: f Y/Z:g Z:a— X: f(ga) (B) 
Notice that, by definition, f and g must both be argument-taking objects 
because they occupy the functor position. This can be made more explicit 


by writing their semantics as B(If)(Ig)a = f(ga), of which (22) is a direct 
translation with slashes. 
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For us to get the same B-dependencies as syntactic dependencies, the fol- 
lowing must be eliminated; f must depend on a because it depends on g which 
depends on a: 


Q3) X/W: f Y/Z:g Z:a X: f(ga) (*B) 


B's syntactic manifestation as (22) is redundant because of the primitive 
(juxtaposition). This effect can be seen below where the task of B is done by 
two applications of the primitive on the right. This notion of redundancy will 
be crucial in Chapter 5 where we choose the free combinators for syntax. 


Q4) @ @ 05 @, @ 05 
X/YY/Z Z X/Y YJZ Z 
EE ER B Y app 

X 


The following manifestation of B, in which the right edge component 
of (22) is carried over to the right-hand side, is nonredundant. We can 
take (25b) to be the syntacticization of the semantic dependencies in (25a). 


(25) a. X/Y: f Y/Z: g > X/Z: Bfg = Àx.f(gx) 
b. X/Y Y/Z— X/Z GB) 
The following translations of (22) are wrong, because the redundancy due 


to ternary application is purportedly eliminated by carrying over the middle 
argument to the right. In B fga, B's semantics is lost if gis after a. 


app 


Q6) a. X/Y: f Z: a> X/(Y/Z): flg) Y/Z: g (*B) 
b. X/Y: f Z:a— X/Z: àx.fx Y: ga (*B) 
c. X/Y: f Z:a— X/(Y/Z): Ag.f(ga) (*B) 


The adjacency constraint on f,g,a in Bfga is violated in the following 
example: Z is unreachable to X/Y and Y/Z to be interpreted by them. It would 
be a nonadjacency semantics for B. 


(27) X/Y: f Y/Z: g W:h Z: a> X: f(ga) W: h (*B) 


Thus the only nonredundant syntacticization of B which preserves the se- 
mantic dependencies is (25b). We can produce the unary version from (25b) 
as well, which will help us to simplify the syntacticization of other combi- 
nators. The right periphery of the left-hand side in (25b) can be carried over 
to the right-hand side as long as we maintain the right order of arguments, as 
in (28). This is what Curry and Feys (1958) called (Bi. 
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(28) X/Y o (X/Z)/(Y/Z) GB) 


Next we consider C, the elementary permutator, with semantics Cf ba = 
fab. 'This combinator swaps the order of arguments for an argument-taking 
object f. Although it does not introduce parentheses on the right, C is a de- 
pendency encoder, unlike K, which is another parenthesis-free combinator. 
The function f depends on the arguments a and b, and their change of order 
is significant to f. It can be syntacticized as follows: 


Q9) (X/Y)/Z: f Y:b Z:a— X: fab (C) 
The first argument of f must be of the same type as the second argument 


in linear order, hence the purported syntacticization in (30) cannot preserve 
C-dependencies engendered by the types of arguments and their adjacency. 


(30) (X/Z)/Y: f. Y:b Z:a X: fab (*C) 
G) o œ 6 O o o 
XYZ Y Z XYZ Y Z 
i DIE 
(X/Y)/(Z/Y) 
xe 
X app 


C's ternary manifestation is behaviorally equal to unary T, binary B, 
unary B and application (31). Similarly, its binary version (32a) is equivalent 
to the behavior of the same combinators (32b). Unary C is defined in (33). 

(32) a. (X/Y)/Z Y 2 X/Z (QC) 
b. QI 0» 
(X/Y)/Z Y 
, 
Z/(Z/Y) 
B 
(X/Y)/(Z/Y) 
B 
X/Z 
G3) (X/Y)/Z & (X/Z)/Y GC) 

The syntacticization of S the substitutor follows a similar line. Unlike B, 

the combinator S assumes a two-argument f in Sfga = fa(ga). (Schónfinkel 


had called it fusion, which makes the dependency of both functions on the 
remaining argument very explicit.) 
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We can faithfully reflect the arity and adjacency of the arguments of S in 
the following syntacticization. 

(34) (X/Y)/Z: f. Y/Z: g Z: a—> X: fa(ga) (S) 
We cannot conceive the following configuration as S because it amounts to 
having S faga = fa(ga) for some S. This is different than Sfga = fa(ga). 

(35) X/Y/Z: f Z: a YJZ: g Z:a X: fa(ga) (*S) 

The following purported syntacticizations of S are wrong because they do 
not embody S semantics. The first one violates the dependency of both f and 
g on a. The second one violates the adjacency of f and g in S f ga. 

(36) a. (X/Y)/Z: f. Y/W: g W:a- X: fa(ga) (*S) 

b. (X/Y)/Z: f Z:a Y/Z:g — X: falga) (*S) 

Ternary S's work can be done by the syntacticized combinators W, B and 

C. Curry and Feys (1958) note the equivalence S — B(B(BW)C)(BB). 
Smullyan (1985) gives a simpler formula, S = B(BW)(BBC). These com- 
binators are explicit in the right column of (37). 

(37) Q, Op On Qi 0» Oh 

(X/Y)/Z: f Y/Z: gZ:a (X/Y)/Z: f Y/Z:g Loa 
S CE 
X: fa(ga) (X/Z)]Y: Cf 
B 
X/Z/Z: B(Cf)g 
Ww 
X/Z: W(B(Cf)g) 
app 
X: W(B(Cf)g)a = fa(ga) 
The binary and unary versions of S, derived from (34), are as follows: 
(38) (X/¥)/Z Y/Z > XJZ GS) 
(X/¥)/Z € (X/Z)/(¥/Z) aS) 


4. Quaternary combinators 


Let us now consider the combinators with four arguments. The first one is ®, 
with the semantics ®fgha = f(ga)(ha). It can be syntacticized as follows. 
Note that f is a two-argument function, and e and h must be functions. 


(39) X/W/Y: f. Y/Z: g W/Z:h Z:a— X: f(ga)(ha) (6) 
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It would be wrong to syntacticize it as (40), because the semantics of 
® would not be ensured on the right-hand side: Ay.gy =a Ax.gx, but locally 
substituting the behaviorally equivalent Ay.gy loses the semantics of 6, viz. 
the same a for g and h. 


(40) (X/Z)/(W/Z)/(Y/Z): f Y:g W:h Z: a> 
X: Ax. f (Ax.gx)(Ax.hx) [x/a] (*6) 
Thus the semantics of ® is intrinsically related to argument sharing, that 
is, to Sand W. Curry and Feys (1958) give the equivalence ® = B(BS)B, 
and another one necessarily involving W, both of which symbolize argument 
sharing. The correctness of syntactic types in (39) can be checked with the 
following derivation involving B and S. 


(41) Ou Q» Q5 Q4 
X/W/Y: f Y/Z: gW/Z:h ZU 
X/W/Z: Bfg- 


X/Z: S(Bfg)h 
X: S(Bfg)ha = B(BS)Bfgha = f (ga)(ha) 
I enumerate the other arities of ® for the record. The unary version will 
play a crucial role in the next chapter in radically lexicalizing coordination in 


all languages, where it will turn out that X, W and Y must be of the same type 
for this special role. 


(42) a. X/W/Y Y/Z W/Z>X/Z (40) 
b. X/W/Y. Y/Z ^ (X/Z)/(W/Z) QO) 
c. X/W/Y e (X/Z)/(W/Z)/(¥/Z) (16) 


Notice that ® cannot be just S. For example, the following syntactic typ- 
ing cannot be 6, as the derivation shows. The function f is a two-argument 
object, not one. 


(43 @ op: Q5 o» 
X/Y: f Y/W/Z: g W/Z: h Z:a 
Vz: Sh 

Y: Sgha i 


app 

X: f(Sgha) = f(ga(ha)) # f(ga)(ha) 
Now we come to a territory which even Curry and Feys (1958) find 
unreasonably complex and unwieldy. The combinator W has the semantics 
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V fgab = f(ga)(gb). Clearly, W must be involved to get two g's, and C must 
be there to account for the ordering ag. They give the following equivalence: 
W = B(BW(BC))(BB(BB)). We can syntacticize it accordingly. 


(44) X/Y/Y: f Y/Z: g Z:a Z: b > X: f(ga)(gb) (W) 


W looks artificial from a natural language perspective as well. Argument- 
sharing has been attested in all languages, for example Mary wants to study, 
and John eats and Barry cooks potatoes (whether these are done by S, W or 
9 is the topic of subsequent chapters). Examples of predicate-sharing are un- 
heard of. (This is of course not much of an explanation until we show what is 
odd about the syntacticized W. That has to wait for another book.) 

The predicate-sharing of the kind we see in gapping, for example in (45), 
can be conceived as and'(like'chem'kafka' (like! eng' witt’). 


(45) Kafka liked chemistry, and Wittgenstein engineering. 


But it requires ® semantics rather than W, i.e. g and A of ® are interpretively 
related in this construction rather than be identical functions, as Steedman 
(2000b: 188) observed.?? The following purported syntacticization of W is 
not valid because it fails to capture the semantic dependencies embodied in 
V. It is inconsistent about g's domain type. 


(46) X/Y/Y: f Y/Z: g Z: a W: b — X: f(ga)(gb) (*W) 


We can also ask what is preventing (44) from receiving an interpretation 
such as f(gb)(ga), rather than f(ga)(gb) as presumed there. After all, both 
(ga) and (gb) are syntactically of the type Y. This is a crucial point, and it 
relates to our understanding of category as consisting of a syntactic type and 
a semantic type. The implication in the syntacticization (44) is that semantics 
of f is like (47a) below, whereas f (gb)(ga) requires (47b). 


(47) a. X/Y/Y: ApAq.fpq Y/Z: g Z: a Z: b — X: f(ga)(gb) (W) 
b. X/Y/Y: ApAq.fqp Y/Z: g Z: a Z: b — X: f(gb)(ga) 


X/Y/Y: Apaq.f pq and X/Y/Y: Anda fan are not of the same category al- 

though their syntactic types are the same. Conflating the arguments to Y on 

the syntactic side without showing the semantic side is unhelpful in this ex- 

ample. I will however continue to use this practice when no confusion arises. 
I enumerate the lower arities of W for the sake of completeness. 


(48) a. X/Y/Y Y/Z Z5 X[Z GW) 
b. X/Y/Y YJZ > (X/Z)/Z (QW) 
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c. X/Y/Y © (X/Z)/Z/(¥/Z) (V) 


Next we consider Rosser's (1935) J, with semantics Jfabc — fa(fcb). 
Like W, this combinator is also predicate-sharing, which is in this case also 
self-embedding. J can be syntacticized as follows. 


(49) X/X/Y: f Y:a X:b Y:c— X: fa(fcb) (J) 


There is no language in which we have a phrase which would be in pseudo- 
English John wants that Barry a book, to mean ‘John wants Barry to want a 
book’. The phrase would have the semantics want'(want'book'barry')john', 
ie. J(Cwant')john'book bord. This fact will similarly await explanation. 
We see no good reason to include Jor W in natural language syntax, either 
dependency-wise or constituency-wise, and that should do for the time being 
in lieu of an explanation. 

Notice that for Jboth the matrix and the embedded f are syntactically 
two-argument functions. Note also the C-effect engendered by the order of 
the arguments X and Y, to obtain fa(fcb), but not fa(fbc). Jis enumerated 
in lower arities below. 


(50) a. X/X/Y Y X ^5 X/Y (3J) 
b. X/X/Y Y > (X/Y)/X QJ) 
c. X/X/Y o (X/Y /X)/Y GJ) 


We stop at this arity (as Curry and Feys 1958 did) because of two reasons: 
(a) Higher arities no longer add to our understanding of syntactically reveal- 
ing semantic dependencies—it has already exceeded its limits in four,?^ and 
(b) we know that S and K are good enough to represent any combination, and 
Y is sufficient for recursion (but not necessary; it can be expressed in an SK- 
system albeit awkwardly).?? The remaining combinators and arities are rele- 
vant to narrowing the kinds of dependencies we see in natural languages. A 
computer equipped with an SK-machine can perform any computable func- 
tion just fine—see Peyton Jones (1987) for such a virtual machine.*© 


5. Powers and combinations 


The definition of powers (see the appendix) provides a natural generalization 
of combinators over functions of various arities. In this section we syntacti- 
cize B? and some combinations of combinators because they are very useful 
in defining other generalizations. 
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Recall that X"*! = BXX", hence B? fgab = BBB fgab = f (gab). There- 
fore K? deletes the two elements in Ki fab, and S? makes two copies of 
the third argument, rather than one copy by S, because S? fgh = BSS fgh = 
f(gh)(h(gh)). 

B? composes a two-argument function with a one-argument function. It 
can be syntacticized as (51). 


(51) X/Y Y/Z/W W ZO X (B2) 


It will be most useful in binary and unary forms in the chapters to follow. 
I list them below. 


(52) a. X/Y Y/Z/W W 5 X/Z GB?) 
b. X/Y Y/Z/W — (X/Z)/W (QB?) 
c. X/Y e (X/Z/W)/(Y/Z/W) GB?) 


Some other combinations have been found to be quite useful and thus 
deserve a name of their own. One source for them is referentially dependent 
words (pronouns), which Jacobson (1999) modeled with a combinator she 
called Z (not to be confused with Curry and Feys's iterator, Z,). Zfga — 
f(ga)a, hence Z = B(BW)B, as Szabolcsi (2003) noted. More simply, Z = 
BSC. We can see the SC-effect in its syntacticization: 


(53) a. X/Z/Y: f Y/Z: g Z: a X: f(ga)a (Z) 
b [o7] 0» 05 
X/Z/Y: f Y/Z: g Z:a 
X/Y/Z: Cf 


XJZ:S(Cf)s ` 
X: S(Cf)ga = BSCfga = f(ga)a 
Its lower arities are listed below. 1Z is Jacobson's (1999) z. (She wrote 
Y/Z as Y.) 
(54) a. X/Z/Y Y/Z ^ X/Z QZ) 
b. X/Z/Y > (X/Z)/(Y/Z) GZ) 


Rosenbloom (1950) christened BB with the name D (Smullyan’s Dove 
and Turner's 1979 B’). Thus Dfagb = BB fagb = fa(gb). Object g must be 
a function, and a,b need not be functions. We can syntacticize it accordingly: 


(55) X/Y/W: f W:a Y/V: g V: b ^ X: fa(gb) (D) 
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Its lower arities are listed below so that we can compare them with the 
unusual combinator to be tackled next. I write the results of the semantics as 
well in preparation of their comparison. 


(56) a. X/Y/W: f W:a Y/V: g ^ X/V: Ax.fa(gx) (3D) 
b. X/Y/W: f W: a > (X/V)/(Y/V): AgÀAx.fa(gx) QD) 
c. X/Y/W: f & (X/V)/(Y/V)/W: AyAgax. fy(gx) GD) 


Now consider O. Its definition is given below." 
(57) OM AfAgAh.f(Ax.g(hx)) Ofgh=f(Ax.g(hx)) Thus O = CB?B. 
The first argument of O is slightly unorthodox because it takes an unsatu- 
rated function as an argument. Note also that f(Ax.g(hx)) is not necessarily 
the same as Ax. f (g(Ax)).?5 Therefore the syntacticized version of O must in- 
clude an orphan argument, Z, as an argument of f, unlike D: 


(58) X/(Y/Z): f Y/W: g W/Z: h 5 X: f(Ax.g(hx)) (O) 


Syntactically, the argument types of W/Z and Y/Z above must be the same 
otherwise we do not capture O's semantics. The following purported syntac- 
ticization is therefore wrong. 


(59) X/(Y/V): f. Y/W: e W/Z: h — X: f(Ax.g(hx)) (*O) 


I enumerate the lower arities of use for O to show that it is different than 
D; cf. (56). 20 is Hoyt and Baldridge's (2008) D. 


(60) a. X/(Y/Z): f Y/W: g > X/(W/Z): Ah. f(Ax.g(hx)) (20) 
b. X/(Y/Z): f & X/(W/Z)/(Y/W): AgAh.f(Ax.g(hx)) (10) 


The curious thing about Ois that, although it is a combinator (its def- 
inition has no free variables), it is not a supercombinator, because g and 
hare free in its lambda-abstracted part of the body, Ax.g(hx). Its close rel- 
ative Dis a supercombinator because its lambdas are all grouped to the left. 
All combinators in Table 1 are supercombinators, except Y. However, some 
expressions with inner lambdas are indeed supercombinators, for example 
Axhy.xy(Az.z)(Aw.0). 

Notice that, unlike Y, O is finitely typeable. Therefore we must scrutinize 
it in the next chapter whether to confine O to the lexicon, or to let it operate 
freely in syntax. 

Finally, consider another mixture of combinators, BS(BB), equivalently 
BSD, with semantics BSDfgab = fa(gab). It is a natural generalization of 
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S over functions with more than one argument. (Other generalizations, such 
as fa(gba), are already covered by S.) We can syntacticize it as follows. 


(61) (X/Y)/Z: f (Y/W)/Z: e Z: a W: b— X: fa(gab) (S") 


The name S" is suggested here to reflect its close relation to S and Bi (S' 
is spoken for; it is Turner's 1979 name for 6.) 

The powers of S do not embody linguistically relevant semantic depen- 
dencies. S? fga = BSSf ga = f(ga)(a(ga)), i.e. a is both a predicate over 
gand an argument of g. Likewise, powers of C are unhelpful. C? = BCC = I. 
C? = BCC? = C. However S" seems quite relevant. We shall see linguistic 
examples requiring S" in Chapter 5. 

The crucial link in the syntactic types of S" is the argument types of X 
and Y, which must contain the same type, viz. Z, in the right order. Some 
purported types for f such as (X/Y)/V or (X/Z)/Y would not be S" semantics. 
Lower arities of S" materialize as follows. 


(62) a. (X/Y)/Z (Y/W)/Z Z ^ X/W GS”) 
b. (X/Y)/Z (Y/W)/Z ^ (X/W)/Z QS") 
c. (X/Y)/Z o (X/W/Z)/(Y/W/Z) Gs") 


6. Why syntacticize? 


This concludes our syntacticization of the combinators. Whether combinators 
or supercombinators, they lend themselves to variable-free syntax in which 
all the semantic dependencies are imported into syntactic dependencies, and 
no other dependency is engendered by syntax, hence every combination is 
solely adjacency-based, including specification of argument-taking, i.e. lexi- 
cal categories. 

Schónfinkel's idea appears to be actually necessary to directly import ad- 
jacency semantics to adjacency syntax. This result was independently discov- 
ered by Curry (1929) and Ades and Steedman (1982). Chomsky (1995) has 
claimed that binary merge is virtually conceptually necessary. (Unary move is 
considered virtually conceptually necessary as well, in Chomsky 2005, which 
is related to Schónfinkel's T.) We now know that they are not. T follows 
from S and K. Binary merge follows from currying, which is a theorem. The 
theorem crucially relies on the prefixed binary juxtaposition of Schónfinkel. 
Therefore, Chomsky is right to claim that it is a conceptual necessity, if we 
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take that to mean a theoretical necessity, but wrong to dismiss a need for 
scientific justification of it. Combinators show how we can justify it. 

The discussion in this chapter might have given the impression that the 
practice expounded here is to promote the meaning-to-form direction of trans- 
lating semantic types to syntactic types, as opposed to form-to-meaning trans- 
lation of for example Chomsky (1970), where the X-bar theory of phrase 
structure is mapped onto meanings, or the Klein and Sag (1985) model, where 
syntactic categories and phrase structure rules are translated into semantic 
types. 

This is not the case. The ‘:’ notation embodies lexical codetermination 
rather than determination. It is a radical lexicalization and combinatorization 
of Bach’s (1976) rule-to-rule hypothesis, by which, rather than Montague- 
Bach-style rules, which would make us worry about whether the syntactic one 
or the semantic one is the determinant, we only have words with combinatory 
categories. By their very nature, they need to be specified uniquely. Thus the 
discussion of priority of syntactic rules and semantic rules becomes moot. 

The reason for going through the trouble of syntacticizing the combinators 
is worth reiterating: they work on semantic objects, functions if you like, 
whereas human language observables are syntactic objects, namely words. 
Of course there can be other ways to go from semantics to syntax or from 
syntax to semantics. The combinatory theory suggests that adjacency is all 
we need. 

The point of importing all semantic dependencies to syntax and creating 
no extra ones is to obtain a purely syntactic type-driven syntax. This aspect is 
the main source of confusion in analogies to form-to-meaning and meaning- 
to-form approaches. Like all analogies including mine in the preface, it is 
misleading, and obscures the true nature of what combinatory syntax does: 
it gives us compositional semantics for free, and in lock-step with syntax, 
i.e. incrementally. The talk of having “a semantically motivated grammar" 
in correspondence theories to hint at the psycholinguistic plausibility (e.g. 
left-to-right processing) is unhelpful because there can be no semantically 
unmotivated grammar. A grammar without semantics is no grammar. 

The combinators covered so far seem to be deterministically translatable 
to syntactic types, but they were designed to be that way to begin with. Radi- 
cal lexicalism predicts that natural language is one domain in which one-way 
determinism cannot hold for all compositional meanings. It does depend on 
the word, and the possible languages we get out of these singularities do not 
differ in arbitrary ways, due to adjacency being the only primitive on which 
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multiple constraints on language can act, for example the constraints which 
manifest themselves in the knowledge of words including predicate-argument 
structure, constituent structure, information structure and intonational struc- 
ture. 

It will turn out that most of the syntactic manifestations of combinators, 
and most of the combinators, are only relevant to the lexical items, not to the 
freely operating universal rules. I took pains to enumerate them in all arities 
so that we can compare the alternatives from a linguistic perspective. This 
requires a set of substantive principles to choose which ones go to the lexicon 
and which ones stay as freely operating. This is the topic of the next chapter. 


Chapter 5 
Combinatory Categorial Grammar 


Mark Steedman's Combinatory Categorial Grammar, CCG, is a theory of 
syntax-semantics for natural languages in which only the combinators that 
directly and solely bear on constituency operate in syntax freely, all others 
being radically lexicalized.?? His conjecture so far has been that this is a 
BTS system. Free operation arises from noninterdefinability. His counteract- 
ing force for this theoretical result is the empirical test of constituency. No 
combinator which is syntacticized can do the work of others, and its syntactic 
work cannot be done by others. ^? 

CCG is strictly Schónfinkelian because the only primitives of the system 
are forward and backward application, which are the syntacticized versions 
of Schónfinkel's juxtaposition. All lexical functions are curried, all syntactic 
rules arise from combinators, and every principal functor in syntax schema- 
tized below faces only one adjacent syntactic object: 


(Skike. oss Rep: 


X is called the principal functor. The result type of the binary combination 
is uniquely determined by X. This is semantic in origin (but clearly syntacti- 
cized), because it amounts to saying that X is the projected result type in the 
local configuration of (1). Because this result arises from the semantics and 
syntax of combinators as shown in the previous chapter, CCG does not need 
an extraneous projection principle; it is predicted by the type-dependence of 
radically lexicalized natural language grammars. 


1. Combinators and wrapping 


By definition, any system that employs surface wrap ceases to be a combina- 
tor system, because no combinator can do the work of wrap, and if we assume 
that a syntacticized rule does the work of wrap, no combinator can match it 
on the semantic side. We would lose the combinatory base of directly and im- 
mediately associating an interpretation with every syntactically combinable 
constituent. 
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This result might be puzzling at first, knowing that C does the equivalent 
of wrap, because Cabc — acb. However, this behavior presumes a wrap in- 
terpretation only if we think of ab as a holistic unit in syntax or semantics, 
which is split by c by being wrapped in them (it is also commonly referred to 
as “ab wraps around c"). The syntacticization of C, repeated below, made no 
such assumptions. Y and Z are categories of independent syntactic objects. 
It would not matter whether we binarize the rule as in the second line. The 
string-view of Cis provided in (2c), in preparation of its comparison with 
wrap. 


(2) a. (X/Y)/Z: a Y:b Z: c—> X: acb (C) 
b. (X/Y)/Z: a Y: b > X/Z: Àc.acb 
C. $1 $25 $3 


X YEA Yb Zc 


515253 := X: acb 


We must distinguish systems with C, which are combinatory, from sys- 
tems with wrap, which are not. So what exactly is syntactic wrap, and why is 
wrap not so subversive when done lexically or semantically? Here we must 
look to Bach (1980, 1984), Dowty (1996).^! Below is Bach" (1984) syntactic 
formulation of wrap translated to current notation. The slash is modalized to 
wrap, following Jacobson (1992). 


3) s $2 (wrap) 
X/wY:a Y: b 
first(s1) so rest(s1) := X: ab 


where first(x) means the first element of a list of structures for Bach 
(first word for Dowty 1996), and rest(x) means the remainder. 


Notice that, semantically speaking, wrap is application, whereas surface- 
syntactically there is no combinatory counterpart. Naturally, this cannot be 
C. Observe also Bach's derivation of persuade John to do the dishes as sur- 
face wrap:;? 


(4) persuade to do the dishes John 

(SNNP)/AyNP/ VP VP NP 
persuade to do the dishes := (S\NP) WNP 

persuade John to do the dishes := (SNNP p 


P 
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Let us now consider Dowty's examples for wrap, the resultatives and verb- 
particle pairs: hammer (the metal) flat, let (the dog) loose, look (the word) up, 
where discontinuity is shown by parentheses. As he points out, hammer round 
does not have the same behavior as hammer flat, therefore we must assume 
hammer flat as a lexical item, which necessarily wraps. 

The implicit assumption here is that the meaning of hammer flat is some- 
thing like hammerflat, not hammer'flat.' The application of hammerflat! to 
metal' gives us hammerflat' metal! stringwise hammer the metal flat, follow- 
ing (3), but not (2). This is indeed wrap in the noncombinatory sense because 
no combinator can split hammerflat' into pieces, whereas C can do that to 
the sequence hammer'flat' easily. Similarly, look up as a lexical entry can 
be lookup,’ or look'up.' In the first case, there is no combinator to get look 
the word up, hence a combinatory system must assume two semantic ob- 
jects look’ and up,’ whereas a wrap system (of type-dependent or structure- 
dependent variety) would have more degrees of freedom in lexical options. 

Dowty extends this view to phrasal items and the Wackernagel posi- 
tion (the second position which clitics universally tend to attach themselves 
phonologically), to the so-called nonadjacent phenomena in languages. It was 
also the motivation in Bach (1984) to analyze persuade John to do the dishes 
as the wrap of John, a syntactically and semantically independent object, in- 
side persuade to do the dishes. 

This move reintroducesCin addition to wrap for the reasons 
just discussed: we must assume that the dependencies arise from 
Cpersuade'tdtd'john,' because they are syntactic phrases. This is the motive 
for */y' in (4), which turns everything into function application semantics, 
i.e. persuade john'tdtd. The surface combination, however, is not C. 

Bach's formulation of wrap is independent of phonology, but Dowty's in- 
terpretation of it is morphophonological, because it assumes that wrap knows 
word boundaries. In any case, an “infix here" point must be remembered for 
every lexical item, which must be maintained properly throughout phrase 
combination, and herein lies another problem. In languages where the no- 
tion of word is linear-recursive (such as Turkish and Gusii; see Hankamer 
1989, Creider, Hankamer and Wood 1995), this seems to require a finite-state 
machine running through word boundaries during the syntactic process, in 
addition to the syntax-phonology interface with its own computations of ex- 
actly the same nature. 

The Wackernagel phenomenon below forces this assumption, where the 
focus- and coordination-clitic de necessarily wraps into the second conjunct 
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with a recursive first word. This phenomena and its related wrap behavior 
must be explained, rather than assumed as knowledge to go with a lexical 
slash such as "Zu". 


(5) Mehmet bugün gelecek, ^ Ev-de-ki-nin-ki-ler de yarin. 
M today come-FUT house-LOC-ki-POSS-ki-PLU FOC tomorrow 
lit. *Mehmet is coming today, and the ones of who is in the house to- 


morrow’ 
meaning, e.g. ‘The family of the girlfriend of the boy in the house will 


come tomorrow.’ Turkish 


The semantic (therefore lexical) use of wrap does not threaten the combi- 
natory base of CCG, because it amounts to a local use of C rather than wrap. 
It has been employed by Szabolcsi (1989) and Steedman (20002) to handle 
for example ditransitive constructions and VSO languages. 

Examples (6a—b) are from Szabolcsi, where she assumes for reasons cited 
in the paper the category (S\NP)/NP/PP for introduce, rather than the sur- 
face word order (SNNP)/PP / NP. Then, because of (6c), we must apply unary 
B to VP\(VP/PP) to get (VP/NP)\(VP/PP/NP) first, in the lexicon, and ap- 
ply unary C, again in the lexicon, to simulate wrap, which yields lexically the 
category (VP/NP)N(VP/NP/PP). 


(6) a. John ` introduced | Mary to himself Szabolcsi (1989: 307) 


(SNNP)/NP / PP VP\(VP/PP) 
b. John introduced Mary to herself 
VP\(VP/PP) 


(VP/NP)\(VP/PP/NP) 


lex C 
(VP/NP)\(VP/NP/PP) 
c. John introduced Mary to himself and Susan to herself. 


This way we maintain a type-raised syntactic object in all cases including 
reflexives, which is an important part of Szabolcsi's organization of grammar. 

Steedman (1996b, 2000a) puts LF to work in (6). I compare the three 
CCG proposals for LF phenomena in Chapter 6. His suggestion for VSO 
languages is a category such as (7a) for Welsh, because of (7b-c). This is also 
lexical/semantic wrap, not syntactic, because the lambda term of the verb is 
Cverb'. 


(7) a. VSO verb := S/NP/NPagr: Ax} Ax2.verb'x2x) 
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b. Gwelodd Wyn ef ei hun Awbery (1976: 131) 
Saw Wyn himself 
“Wyn saw himself,’ 
c. *Gwelodd ef ei hun Wyn 
Saw himself Wyn 


The treatment of adjacency creates two worlds for combinatory linguistic 
categories, one in which adjacency as the sole base looks at possible cat- 
egories (i.e. possible languages) by enumerating all adjacency-based cate- 
gories, and the other in which adjacency effects and other factors are incorpo- 
rated into theories as needed (e.g. Moortgat and Oehrle 1994). Because of the 
mediating subtheories in the latter kind of framework and the use of a logical 
form in the first one, cotranslatability of the categories in the two categorial 
worlds is becoming increasingly difficult. One can see the clear split in Com- 
binatory Categorial Grammars and Type-Logical Grammars, although there 
are many points of contact and good sources of inspiration both ways (cf. 
Morrill 1994, Moortgat 1988b, Carpenter 1997, Moortgat and Oehrle 1994, 
Baldridge 2002, Kruijff and Baldridge 2004, Hoyt 2006).? 


2. Linguistic categories 


What are the solely adjacency-based categories for language? The crux of 
the matter is that whatever the nature of these categories is, they are the cat- 
egories of syntactic objects. This is an empirical requirement, because the 
observables are the syntactic objects, namely words, not the semantic ob- 
jects. A category is a hypothesis about what the syntax-semantics connection 
of the observables could be. That of course does not prevent categories from 
being semantic in nature, as Edmund Husserl (1900) claimed to be the case: 


Clearly we may say that if presentations, expressible thoughts of any sort 
whatever, are to have their faithful reflections in the sphere of meaning- 
intentions, then there must be a semantic form which corresponds to each 
presentational form. This is in fact an a priori truth. And if the verbal re- 
sources of language are to be a faithful mirror of all meanings possible a 
priori, then language must have grammatical forms at its disposal which give 
distinct expression, i.e. sensibly distinct symbolization, to all distinguishable 
meaning-forms. Logical Investigations vol H: 55 


Many categorial grammarians consider Husserl's statement to be the birth 
of categorial grammar. This is not surprising, because of the implicit com- 
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mitment since the early years of categorial grammar to have the substantive 
categories associate only with verbal resources, namely words (as opposed to 
say with both words and grammar rules). This is an explicit commitment in 
Combinatory Categorial Grammar: 


(8) Radical Lexicalism: 
All language-particular information is in the lexicon. 


The term Radical lexicalism 1s due to Lauri Karttunen (1989). The method 
is described in the appendix. Radical lexicalism in light of Husserl’s desider- 
ata suggests two manifestations of categories: formal categories and substan- 
tive categories. Formal categories are universal generalizations of the sub- 
stantive categories, hence they are not different in kind.^^ We can think of the 
syntacticization of combinators as yielding formal categories, for example 
X/Y: f, X: fa and Y: a below for application. 


(9 X/Y:f Y:a>X: fa 


Any substantive category can substitute for X and Y above (provided that 
the desired adjacency configuration required by the rule is satisfied). This is 
not true of substantive categories, say S\NP, where S means “sentence” and 
NP means ‘noun phrase’. Only NPs can substitute for NP, to get Kafka, or 
The stories of Poe etc. 

Similarly, semantically open propositions can be substituted for S/NP to 
get The man devoured, or Mary hit, but not *Kafka chemistry where an ob- 
ject of category NP (Kafka) attempts to substitute for S/NP. Note that the 
sequence Kafka chemistry can be predicational, but this interpretation is par- 
asitic on a verb, as in gapping: 


(10) Wittgenstein adored engineering, and Kafka chemistry. 


Here, the required category is not S/NP for Kafka. It is NP, as the seman- 
tics of the sentence proves. 

To be able to distinguish Wittgenstein adored from adored Wittgenstein in 
the Husserlian sense, we must categorize them differently although both are 
open propositions semantically. In CCG parlance, the former is $/NP and the 
latter is SNNP. The difference in slashes is a forced move of syntacticization. 

Unlike the semantically-motivated combinators in which we can choose to 
represent all functions in prefix notation, the syntactic objects of languages 
vary in directionality. Tagalog is head-initial whereas Turkish is head-final. 
English is head-initial (e.g. of the book) and head-medial, as in its basic word 
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order SVO. Thus the syntacticization of combinators must consider this as- 
pect as well to complete the picture. 

We can think of ‘backward application’ as the only other possibility of ap- 
plication because there is only one function and one argument in application, 
i.e. only one slash: 


(1 Y:a XW:foX:fa (<) 


The semantic dependencies of application are preserved in this version as 
well; the semantic result is fa, not af. This factoring of order into the cate- 
gories is reflected in the name of the rule, viz. ‘<’ for backward application 
and ‘>’ for forward application. 

Thus the following purported manifestations of application are ruled out 
because they do not preserve the semantic dependency instigated by order: 


(12) a XY:f Y:a—X:fa (*>) 
b Y:a X/Y:f X:fa (*«) 


In a configuration where there is more than one slash, for example in the 
binarized composition (13a), the possibilities in (13b-d) preserve the seman- 
tic dependency of order, but the ones in (13e-h) do not. Thus we can subsume 
Steedman's (2000b) principles of consistency and inheritance, which helped 
to eliminate configurations such as (13e—h), by the semantics of order inher- 


ent in combinators.? 

(13) a. X/Y: f YJZ: g 5 X/Z: Bfg (>B) 
b. Y\Z: g XW: f > X\Z: Bro (<B) 
c. X/Y: f YZ:g > X\Z: Bfg >œ B.) 
d. Y/Z: g X\Y: f > X/Z: Bfg (« B.) 
e. X/Y: f. Y\Z: g > X/Z: Bro (*» B) 
f. Y/Z:g X\Y: f > X\Z: Bfg (*< By) 
g. XY: f Y/Z:g > X/Z: Bfg (*» B.) 
h. Y\Z: g X/Y: f >X\Z: Bfg (*< By) 


These restrictions are forced moves in the theory. In (13e-f), the direction- 
ality of Y is respected but the directionality of Z is not. In (13g-h), the direc- 
tionality of Z is respected but the directionality of Y is not. All directionalities 
are respected in (13a-d). Notice that directionality is inherently a syntactic 
property of the argument and not the result, as first observed by Steedman 
(1991b). Thus there is no directionality of X above, and all directionalities 
are accounted for in the logic of the argument from order semantics. 
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A backward or forward slash is not necessarily a crossing slash. This infor- 
mation needs to be contextualized, for example as "(7 for a crossing forward 
slash and * (7 for a harmonic forward slash. A “don’t care" forward slash can 
be contextualized too, as ‘/’. These aspects are relevant in contexts in which 
the curried binary configuration involves two or more slashes, as above. Since 
there is one slash in application, we can make categories application-only 
too, with the most restrictive slash ‘/’ (likewise for the backward slash). In a 
purely applicative system, these modalities exhaust the possibilities for slash 
contextualization. 

These are the modalized combinatory categories of Baldridge (2002). He 
defines the following hierarchy as a way of compiling the knowledge of slash 
compatibility: 


(14) CCG type lattice for slash modalities (from Baldridge and Kruijff 


2003): 
* 
The dot is the least restrictive modality, the star the most restrictive. The dia- 
mond and the cross are partially restrictive and mutually incompatible. Thus 
a ‘/’ slash is only compatible with itself, and ‘/’ is compatible with all for- 


ward slashes (similarly for backward slash). The least restrictive modality is 
omitted by convention to avoid further notational clutter. 


(15) The ‘V is same as ‘\.’. The ‘/’ is same as */". (dot omission) 


Now we can refine the syntacticized combinators of this section: 


(16) a. X/Y:f Y:a—X:fa (>) 
b Y:a X\Y:f>X:fa (<) 
c. XY: f YlZ: g > X/Z: Bfg (œB) 
d. Y\Z: 8 X\.Y: f> X\Z: Bfg (<B) 
e. X/Y: f Y\ Z: g > X\ Z: Bfg (> B.) 
f. YlZ:g X\ Y: f —>X/Z: Bfg (« B.) 


The goal of introducing the combinatory modalities is to make finer dis- 
tinctions in the Husserlian sense, for example to distinguish SNP of Wittgen- 
stein would adore from S/ NP. The need to distinguish these categories is 
forced by the data, under the assumption of adjacency-only syntax: 
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(17) The field which I think that Wittgenstein would adore is web engineer- 
ing. 
We are forced by related data to categorize that as S' /,Sgy: 
(18) *The philosopher who I think that would adore Wittgenstein is Russell. 


This category disallows the combination of that with would adore Wittgen- 
stein, which is the critical difference between (17) and (18). The newly in- 
troduced syntacticization in (16) would be in vain if we could not distinguish 
*that would adore Wittgenstein from that Wittgenstein would adore. 'The crit- 
ical steps of (17) and (18) are shown below, in which the differing possibility 
of (16c) versus (16e) does the critical work. Thus we must distinguish S’ / Sfin 
from ST (Son, the latter of which would allow (19b). 


(19) a. that Wittgenstein would adore 


S' (Sin Stin/,NP 
>B 
S' (NP 
b. that would adore Wittgenstein 
S' Sin San NP 
xe B. 


The star modality is also a forced move, given the adjacency assumption 
and Husserl’s desiderata. And's category must be more refined than (SN ,S)//S 
and (SN, S)/ S:*6 


(20) a. *player that shoots and he misses 
(N\ N)L(S|NP) S\ NP (S\S),S S 
SS 
«B 
SN, 
(Baldridge 2002) 
b. *Kafka and he studied chemistry smiled. 
S/ SNP) (SN.S)LS S S\NP 
S\S 
«Bx 
S/(S\,.NP) 
5 > 


Under the present method of syntacticization without extra assumptions 
over and above adjacency, both examples would be fine if we did not have 
(AS), SIS for and, and did not work with the modalized combinators of (16). 
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For example, (13f) would allow (20b) as shown in the derivation. The exam- 
ples in (20) also demonstrate a convenient generalization of directionality: 
the underspecified slash. 


(21) ‘|’ stands for ‘\m and ‘/,,’, with modality m.(dir. underspecification) 
m can be underspecified too, as in (20b)'s Kafka. 


We can now distinguish the relative pronouns that and whom by typing 
them with the categories (NN ,N)/(S|NP) and (NN N)/.(S/NP), respectively: 
(22) a. the field that [Kafka admired] c /NP 
b. the field that |admired Kafka | S\NP 
c. *the chemist whom admired Kafka 
(N\ N)L(S/NP) ` SNNP 


kk 


The last bit of differentiation to make good on Husserlian categorization 
is the difference in like versus likes. As these are related but different words 
(in fact, the same lexeme is involved), we would expect their categories to 
be related but different. Following Kay (1985), Shieber (1986), we decorate 
the basic (nonslashed) categories with features. We abbreviate them to save 
space; 3s is short for AGR=3s, where AGR is an agreement feature. 


(23) likes :=(S\NP3;)/NP 
like := (S\NP-35)/NP 


The feature geometry can in principle be language-particular, and need not 
concern us here.“ We shall however make use of common generalizations 
such as AGR and FIN(inite). Suffice it to say that we do not need a sophis- 
ticated theory such as that of Gazdar et al. (1985), Pollard and Sag (1994), 
Calder, Klein and Zeevat (1988) in which unification does nontrivial linguis- 
tic work, whereas the working hypothesis of the book is to let syntacticized 
combinators do all the work except basic category matching. 

The theory also differs from Chomsky (1995) where a universal feature 
geometry is attempted. Features can be radically lexicalized just like com- 
binatory categories. The process might miss some early generalizations over 
categories and features, but so be it. The generalizations that will arise from 
order semantics is our present concern. We can attempt to recapture the same 
generalizations, and hopefully more, after we flesh out all attested linguis- 
tic categories. One such example is the reworking of functional features as 
combinatory categories, which we do in $9.7. 
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To summarize, the following is the landscape of the syntactic types. 


(24) Take F to be a feature geometry (a finite set of features). 
Let V C FY) be a set of valuations of features from some value space 
V mapped by v. 
Take B to be a finite set of basic categories (without slashes). 
Let S = BY. (All possible feature-decorated basic categories) 
Let M = {-,o,x,«}. (The set of modalities) 


Define C (the set of possible syntactic types): 
Any member of S is a potential type in C. 
If A € C and B € C, then A |m B € C, for some m € M, and | € (V. /]. 
AP ECifAEC,BEC. 
Nothing else is in C. 


Explicitly enumerating the countably infinitely many distinguishable cate- 
gories is the starting point of sieving some of the categories as unlikely cate- 
gories for human languages. 

Naturally, Kafka’s category NP is not discriminating enough, thus we can 
write NP: kafka’ to distinguish it from NP:wittgenstein.’ Such obvious dis- 
tinctions will be abbreviated for the sake of exposition. 

The exponent category AB semantically denotes a function from B to A, 
and differs from A|B because it does not introduce a syntactic function. It 
is the main syntactic source for Jacobson 1999-style combinatory referential 
dependencies, and it has predictive powers in that field, for example relating 
the extraction domain (N\N)/(S|NP) to the relativization domain (N\N)/ sNP 
of resumptive pronouns. Its use in CCG so far has been constrained to cases 
where B is a basic category. 

One final constraint on lexical syntactic types relates to lexical general- 
izations, where we can refer to a set of types and pick the ones that satisfy a 
constraint. It is the dollar convention of Steedman (2000b). 


(25) T$A stands for the finite set of categories TA ($-convention) 
such that functions in T are lexical and onto T. 
A can be empty. 
TSA is empty if T is empty. 
For example, S$ for Turkish is (S, S\NP, S\NP\NP, S/(S\NP), ... }. The set 


S/NP$ would be empty. S\$NP for English is the set {S\NP, (S\NP)/NP, 
(S\NP)/NP/NP.,..}. Categories (S\NP)/PP and S/NP are excluded. 
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The claim of CCG is that a grammar solely consists of radically lexi- 
calized categories, lexically pairing the combinatory syntactic types deco- 
rated with features with a PADS, per word. Any context-free phrase-structure 
grammar and linear-indexed grammar can be reduced to its lexicon if we are 
willing to translate distributional categories such as N, V, A, P to combinatory 
categories. A category is a rule as an intensional device. Practicing linguistics 
“without rules" and “with principles" does not change the operative maxim. 
Hence Bach's rule-to-rule hypothesis is relevant to any linguistic theory that 
makes use of the notion of computation and Turing representability where the 
notion of "rule" is built-in. 

This brings us back to the troublesome interaction of feature spaces, rules 
and mappings. All mappings leak, unless they are lexical. Even then they are 
underdetermined by external meanings, which is why some statistical book- 
keeping must be connected to the use of a lexical correspondence. Radical 
lexicalization adds to this observation the property that if we radically lex- 
icalize all structure-building, then one end of the mapping or rule ought to 
be some kind of compositional semantics (logical form, predicate-argument 
structure, dependency relations, etc.). Radical lexicalism in this narrow sense 
goes back further than Karttunen (1989), who coined the name. In the famous 
1960 conference which also included contributions by Chomsky and replies 
to and from his critics, Lambek (1961: 169) expressed the program: 


For our purpose it will be convenient to think of a phrase structure grammar 
as follows: the dictionary assigns to each atomic phrase a finite number of 
primitive types. The grammar consists of a finite number of rules of the form 
Pipj — px where the p; are primitive types.[fn] 

While it seems unlikely that the elimination of grammatical rules in favor of 
dictionary entries can be carried out for every phrase structure grammar in 
this sense (without making the dictionary infinite), this can be done in many 
examples (in fact all that I have tried). 


His following suggestion can be taken as the start of the program: “It may 
happen that type assignments in a dictionary entry are in a sense stronger than 
the explicit rules of a phrase structure grammar"Lambek (1961: 170), which 
he illustrates in the remainder of the paper using pronouns and wh-items.^* 
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3. CCG is nearly context-free 


The inadequacy of categorial grammars might have been thought to be true in 
1964— with the crucial exception of Lambek (1958, 1961). It became doubt- 
ful by the publication of Geach (1972), Shaumyan (1977), Ades and Steed- 
man (1982), Joshi (1985) and Oehrle, Bach and Wheeler eds. (1985/1988), 
and proven to be wrong by Joshi, Vijay-Shanker and Weir (1991), Vijay- 
Shanker and Weir (1994). 

The emerging formal class of languages, which Aravind Joshi named 
mildly context-sensitive languages (MCSL) in its upper limit, are a super- 
class of context-free languages and subclass of context-sensitive languages, 
with a well-defined algorithmic substrate (embedded push-down automata). 

The least powerful extension of context-freeness is achieved by linear- 
indexed grammars (Gazdar 1988), which characterize Linear-indexed Lan- 
guages (LILs). Lexicalized tree-adjoining grammars (LTAG; Joshi and Sch- 
abes 1992) and CCG are provably linear-indexed (Joshi, Vijay-Shanker and 
Weir 1991). The desirable features include (a) polynomial-time parsability 
and (b) the constant-growth property of MCSLs, which ensures that all the 
languages of this class have strings whose lengths grow linearly, and (c) ef- 
ficient parsability. Although all MCSLs are polynomially parsable, they are 
not all efficiently parsable, which LILs are. That is why they are the compu- 
tationalists’ choice of algorithmic substrate when full coverage of nested and 
crossing dependencies is attempted. An example of the latter is shown below. 


(26) ..omdat ik, Cecilia; de nijlpaardens zagı VOEren? 3 Dutch 
.becausel Cecilia the hippopotamuses saw feed 
*..because I saw Cecilia feed the hippopotamuses.' 


We know for example that Shieber's (1985) Swiss German data and 
Huybregts's (1976) Dutch data such as above are provably above context- 
freeness, and properly within the class of nearly context-free languages, for 
there are LTAG and CCG grammars for them. 

Vijay-Shanker and Weir (1994) and Joshi, Vijay-Shanker and Weir (1991) 
proved and exemplified that for every combinatory categorial grammar, there 
is a linear-indexed grammar and vice versa. These grammars have nontermi- 
nals which can be associated with a stack, and the stack can be passed from/to 
the left nonterminal to/from a single nonterminal on the right-hand side of a 
rule, which restores our problem of radically lexicalizing CCG grammars be- 
cause it suffices to have a single symbol on the left-hand side of every rule. 
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Translation to CCG is roughly as follows: CCG categories can be viewed 
as their result category plus a stack-valued feature identifying their arguments 
and the order of their combination. For example, NP is NP[], and S\WP,/NPp 
is S[VP,, NPy] in the stack-equipped nonterminals of a linear-indexed gram- 
mar. The */NP,' must be on top of the stack because it is the first argument 
to combine in the CCG category. Thus the stack preserves the relative order 
and currying of the CCG category. 

The linear order of arguments for example in 'SNNP4/NPy' is encoded in 
the grammar rule, not in the stack. In this case the linear-indexed rule would 
be S[..] —^ NMP. V NPp, if we think of S[.., NP4, NPp] as V's category. Since 
every linear-indexed language has a linear-indexed grammar, radical lexical- 
ization up to and including Dutch and Swiss German crossing dependencies 
is complete. 

I show CCG's handling of the Swiss German crossing dependencies in 
Figure 5. The indices in the figure are meant to facilitate to trace the deriva- 
tion of correct semantics. Steedman (2000b) shows the Dutch case. I chose 
Swiss German because the requirements seem more strict on the syntactic 
and the semantic side. All arguments in a subordinate clause are case-marked 
in Swiss German, and they must match the subordinate verbs' case require- 
ments; see Shieber (1985) for discussion. The derivation's mechanism is the 
topic of the next section. ^? 


4. Invariants of natural language combination 


CCG claims that there are two kinds of semantic dependencies which have a 
direct reflection on syntactic processes: invariants, which need not be stipu- 
lated in the grammar of every language (the so-called universal dependen- 
cies), and lexicalizable dependencies that need to be part of a language's 
grammar. The syntacticization of semantic dependencies by combinators 
serves both resources, thus we need empirical and theoretical grounds to de- 
cide whether a dependency is lexicalizable or not, and whether it should be 
lexicalized if it is lexicalizable. 

An example of forced lexicalization is Inuit's constraint that ergative NPs 
cannot be relativized (Manning 1996). This is something the head of rela- 
tivization must enforce, say by requiring a domain of type S\NPaps, because 
the language is verb-final and relatively free word order, hence an extraction 
domain such as S\NP is clearly possible but not opted for by Inuit. It has 
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Figure 5. Swiss German crossing dependencies in CCG. 
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transitive participial forms, which could easily allow ergative NP extraction 
if not constrained in the grammar of Inuit. 

We can think of Ross's (1967) Coordinate Structure Constraint, its excep- 
tions and exceptions to exceptions, as examples of global asymmetries cap- 
tured by invariants without further assumption in the lexicalized grammar, as 
shown in $2(14). No special constraint is needed to capture these properties. 
The lexical constraint on the coordinator, that it requires like-categories to 
maintain the semantics of coordination, is motivated independently of ex- 
tractability and nonextractability. There will be no freely operating “rela- 
tivization combinator" or "coordination combinator," and we would expect 
constituents that undergo these constructions to be quite opaque to the lexi- 
cally licensed meanings of relative markers and coordinators. 

The notions of redundancy and opaqueness to syntactic processes, there- 
fore flexible constituency, play a decisive role in determining the invariants. 
As the discussion in Chapter 4 implied, the kind of work that ternary and qua- 
ternary combinators do at their defined arities can be done by lower arities 
and application. This was shown for ternary B, C and S syntactically. Simi- 
lar results await quaternary combinators. Since we know from Schónfinkel's 
original work that S and K are good enough to capture all effectively com- 
putable dependencies (and more), the faithful syntacticization of the combi- 
nators without extra assumptions suggests that the same holds for the syntac- 
tic variety of other combinators. 

S is ternary and K is binary, and we know that ternary S is redundant if we 
have binary B, unary W and unary C. I repeat this result here, from §4(37): 


Q7) o o: o n OQ — 05 
(X/Y)/Z: f Y/Z: gZ:a (X/Y)/Z: Xue Za 
X: falga) —— (X/Z)/Y: Cf 
X/Z/Z: B(Cf)g 


Ww 


X/Z: W(B(Cf)g) 
app 
X: W(B(Cf)g)a = fa(ga) 
Binary B is indispensable for purely adjacency-based solutions to exam- 
ples such as (28a). The critical point of the derivation is shown in (28b). It is 


also justified by the constituent behavior of the same substring, for example 
Who do you believe that Mary likes and John detests? 


(28) a. Who do you believe that Mary likes? Szabolcsi (1989) 
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b. Who do you believe that Mary likes? 
NP (S\NP)/NP 
—— T 
S/(S\NP) 
S/NP 
Unary W seems empirically undesirable. Szabolcsi (1989) observes that 
we have yet to find a language in which an expression related to the one 
below means John turns himself. It would require unary W as shown. 
(29) John turns 
NP (S\NP)/NP 
S\NP 
app 
S 
The point of course is not that the word ‘turn’ might mean ‘turn himself’ 
in this example, but in a syntacticized system where combinators do their 
work by syntactic types, i.e. by being opaque to the lexical meaning of turn, 
the rule above would also engender John reads, John devours, to mean John 
reads himself and John devours himself. Similarly, a binary W is problematic. 
The same example can be derived by binary W as follows: 
(30) John turns 
NP. (SNNP)/NP 
— w 
S 


Thus we have good empirical reasons not to have W in syntax at all. This 
result might appear to make the ternary S nonredundant (see 27). First I note 
from Szabolcsi (1989) that binary S is certainly operating in syntax because 
we know the existence of languages with parasitic gaps. The crucial involve- 
ment of S is shown below. 


(31) (articles) which I will file without reading 
VP/NP (VP\VP)/Cing Cing/NP 
(VP\VP)/NP ` 


VP /NP 
Steedman (1988) 
It is S semantics because articles is an argument of both file and read, and 
without the first “gap” after file, it is ungrammatical, say *articles which I 
will file the folders without reading.^? 


78 | Combinatory Categorial Grammar 


Further evidence is from coordination: the articles which I will file with- 
out reading and report without contradicting. Now the redundancy of ternary 
S follows from the necessity of binary S and application (likewise the redun- 
dancy of ternary B, which also follows from binary B and application): 


(32) 091 0, 0 
(X/Y)/Z: f YJZ: g Z:a 
X/Z: Sfg : 


app 
X: Sfga = fa(ga) 
What about unary B and unary S operating in syntax? Recall some syn- 
tacticized versions of these combinators in order to study what is at stake. 


(33) a. X/Y & (X/Z)/(Y/Z) (1B) 
b. X/Y & (X\Z)/(Y\Z) CB) 
e. (X/Y/Z & (X/Z)/(¥/Z) GS) 


A revealing empirical argument against unary B came from Szabolcsi 
(1989), who suggested that the syntactic behavior of complete constituents 
does not necessarily extend to incomplete constituents, which is precisely the 
effect of unary B. 

Consider the complementizer that, with the category S” /Sfn. We would not 
want the incomplete version (S'NP)/(Sg, NP) which would be engendered 
by unary B: 


(34) a. I think that Wittgenstein might have liked Kafka. 


VP/S' S'/Stin Sfin 
y app 
b. *I think Wittgenstein that might have liked Kafka 
S' / San San NP 
B 
(S'\NP)/(Sfin\NP) 


app 
S'\NP 

Some complementizers in some languages might choose to make their 

version of unary B grammatical, but this would have to be a lexical choice, 

not engendered by syntax. In fact, English does just that: the forward variety 

of unary B, viz. (S' /NP)/(Sgy /NP) gives exactly the same semantics as (34a): 
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(35) I think that Wittgenstein might have liked Kafka. 
S' / San Sea / NP NP 
(S' /NP)J (Ss /NP) 
S'/NP T 
y app 


There must be language-specific constraints on unary B, for example di- 
vide by */NP' rather than ‘\NP’ as above, hence it must be lexicalized.?! 

Now consider Welsh to see the effects of a freely-operating unary S. Welsh 
has a strict word order of VSO, which can be characterized as VSS’ when 
the argument is a complement clause $'. We can categorize the complement- 
taking verb as such: 


(36) Dymunai Wyn i lfor ddarllen llfyr. Awbery (1976: 37) 
Wanted Wyn for Dor reading (a) book 
(S/S')/NP NP A 


“Wyn wanted Ifor to read a book.’ 


A unary S must be lexically constrained because, although Welsh allows 
subject-sharing complements (37a) (i.e. incomplete constituents), the word 
order instigated by unary S from complement-taking verbs would be ungram- 
matical (37b).?? 
(37) a. Dymunai Ifor ddarllen ` llfyr Awbery (1976: 39) 
Wanted Ifor reading (a) book 
S/(S' /NP)/NP NP ST /NP 
‘Ifor wanted to read a book.’ 
b. *Dymunai ddarllen llfyr Ifor 
(S/S')/NP S'/NP | NP 
(S/NPJS'INP) 
The Welsh verb must avoid unary S. The modalities cannot help in such 
examples to eliminate them. Therefore unary S cannot be syntactically free. 
Let us now take stock of what is needed in syntax in terms of dependency 
and constituency, and what should be lexically controlled. The combinators 
S, K, C, B, Wand I play a crucial role in establishing the power of com- 
binators to capture any computable semantic dependency. The first two are 
Schónfinkel's primitives, and the last four were Curry's primitives until he 
encountered Schónfinkel's work in a literature search in 1927. He adopted 
K immediately, and considered S to be somewhat artificial. 
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William Craig proves in his section $5H of Curry and Feys (1958) that 
among this group an S-effect is impossible without B, Cor W. Therefore 
K can be ignored for the S-effect. A K-effect is impossible with the remain- 
der of the group. Without S and K, a B-effect is impossible. A W-effect is 
possible without K. Thus {I,K} and (B, S, C, W] form two sets in which 
any system that aims at behavioral equivalence to lambda calculus must con- 
tain one combinator from each set. 

S and B are not interdefinable if we eliminate C and K. Similarly, C and 
W are not interdefinable if we eliminate S and B. On what basis do we choose 
a set of combinators that always operates on syntax? Szabolcsi offers a formal 
criterion in addition to the empirical ones we have seen so far: 


(38) The combinators running free in syntax are (a) noninterdefinable, or 
(b) compositions of such noninterdefinable combinators. Other derived 
combinators are lexicalized. Szabolcsi (1989: 305) 


This hypothesis is not sufficient to rule out K and | from syntax. K is not 
interdefinable by the remaining five combinators, and without K, I is not in- 
terdefinable by BSC either. The criterion therefore is meant to supplement 
the empirical reasons rather than replace them. 

The following desiderata emerge from interdefinability and from the limits 
of dependencies attested in natural languages: 


(39) (i) Kis not desirable because (a) its lexical effect has not been attested 
in languages, (b) its power of deletion is a threat to decidability. 

(ii) Any slash is implicitly an I in terms of semantic dependency. | adds 
nothing to syntax, but it must play a crucial role in the lexicon. 

(iii) B seems inescapable, otherwise we cannot surpass the context- 
freeness barrier. Application is good enough for context-free de- 
pendencies (Bar-Hillel, Gaifman and Shamir 1960). Without K, 
B cannot be defined by SCIW. With IK gone from syntax, the re- 
mainder SCW cannot achieve a B-effect. 

(iv) Some manifestation of the CW effect is needed, which brings S into 
the discussion. This can be done by the sequence BST because 
C = B(T(BBT))(BBT) as Church (1940) and Szabolcsi (1989) 
noted, and W — ST. It can also be done by BCW because S — 
B(B(BW)C)(BB). 


Cases (1) and (iv) need empirical support. No language seems to have the 
K-like vacuous abstraction exemplified below: 


Invariants of natural language combination 81 


(40) * WHAT does Mary like Bill? Szabolcsi (1989: 3b) 


Notice that this is different than the apparently related German example 
below, where wh-in situ is grammatical. There is no vacuous abstraction here, 
since wers are one and the same. 


(41) Wer glaubst du wer nach hause geht? Crain and Pietroski (2001) 
Who do you think who goes home? 


The closest example I could think of for vacuous abstraction is the headed 
morphological compounds of German (42): “Genitive case endings function 
as morphological *glue" when their use would be disallowed in the corre- 
sponding noun phrase" Payne (1997: 93). 


(42) Bischoff-s-konferenz (Anderson 1985) 
bishop-GEN.sg-conference 
“conference of bishops’ 
* for ‘conference of bishop’ 


The process is quite productive, and from the perspective of the con- 
stituents of the compound, -s- seems like K’s victim, with the semantics 
Kb leen! = b'k'. The primed semantic objects stand for the semantics of 
Bischoff, Konferenz and -s- respectively. 

But it could also be that -s- is another lexical item in German, different 
than its genitive case marker interpretation, which yields a morphologically- 
headed compound as Payne suggested. Thus if we are willing to extend our 
notion of lexicon to include objects with categories other than words, we have 
an analysis without K. No such freedom seems to exist for what in (40).? 

As for the CW effects, we have seen empirical reasons for W not to op- 
erate in syntax, therefore it must be lexicalized. The question then is the fol- 
lowing: do we lexicalize C, or is it free in syntax in some arity? According 
to Szabolcsi’s formal criterion (38), its lexicalization depends on whether we 
have T in syntax, because C = B(T(BBT))(BBT). 

A freely-operating T, in the truest sense of the term, is redundant in bi- 
nary form if the unary version is available ($4.1). And, as we shall see, the 
unary version must be available in syntax in a constrained way. These results 
altogether suggest that C must be lexical. 

Empirical reasons complement the picture by suggesting lexicalization 
as well. Take for example the VSO language Welsh. The category of 
the transitive verb is (S/NP5)/NP|, where NP, stands for the subject NP 
for convenience (Welsh has no morphological case). Unary C would yield 


82  Combinatory Categorial Grammar 


(S/NP|)/NP», which is equivalent to saying that VOS order would be gram- 
matical too, which is not true for Welsh. A binary C would yield SNP) 
from the configuration (S/NP5)/NP, NP’, which also amounts to licens- 
ing VOS for Welsh. Judging from Steele's (1978) typological study of limited 
appearance of alternative word orders, this process must be lexically con- 
trolled in all languages. 

The occurrence of strict word-order languages suggest that we should not 
employ C to understand the free word-order effects of scrambling languages, 
unless we are willing to entertain parametric competence grammars where 
one set of combinators prevails over others depending on some kind of pa- 
rameter setting over the universal repertoire. As there is no initial-state uni- 
versal grammar in CCG that “grows into" an adult-state grammar, there is no 
room for a parametric combinatory base either, thus the prediction of CCG is 
that any C-effect must be specified in the lexicalized grammar of a language. 

Next I show that the syntactic common core of CCG, the BTS system, is 
computationally well supported. Then we look at the additional assumptions 
about combining variable-free syntax with variable-friendly semantics in the 
next chapter. 


5. The BTS system 


Adjacency as an auxiliary assumption was deemed detrimental because com- 
binators cannot handle wrap (81). A Cin the lexicon is not wrap because it 
does not wrap strings but syntactic and semantic types. Recall Szabolcsi’s 
(1989) category (S\NP)/NP/PP for introduce, rather than the surface word 
order (S\NP)/PP/NP, which was motivated by binding possibilities, which 
required unary C to apply lexically to (VP/NP)N(VP/PP /NP). We may con- 
sider this move as the abandonment of some nonconstituent coordination 
analysis (43), but this is an issue within reach of combinators, and its resolu- 
tion is not our concern here. It is important that it does not violate adjacency. 


(43) John announced Mary and introduced Harry to the party crowd. 


We take adjacency as a fundamental assumption to look at its full conse- 
quences, rather than bring it in when necessary. 

The mild context-sensitivity result of Vijay-Shanker and Weir (1994) for 
CCG holds only if a bounded use of powers is employed, i.e. B" and S”” 
for some m,n. Recall that B” = BBB"-! and S"" = BS"S""! The second 
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clause of (38) predicts their free operation in syntax because it equivalent to 
BXY for some noninterdefinable X and Y, as Szabolcsi observed. 

Hoffman (1993) showed that a freely-operating T gives us the strictly 
nonlinear-indexed language (a"b"c"q"e" | n > 0). Current findings on the 
adequacy of nearly context-free grammars and the inadequacy of context-free 
grammars for linguistic description depend on the bounded use of B and T. 

The T must be finitely schematized to maintain near context-freeness, 
which can be done by compiling over a radically lexicalized grammar to see 
all kinds of argument and result types. Alternatively, it can always be kept 
in the lexicon, which by definition would be a finite schematization. Let us 
consider both possibilities. 

The lexical T is by definition a unary T. Recall also that binary T is ren- 
dered redundant by the unary T and the primitive of the system. 

Regarding the possibility of a unary T-less syntax, it is not possible to 
always build T into the lexical categories of argument encoders such as deter- 
miners and case markers. Some languages lack determiners. Moreover, there 
are caseless languages, and also languages with morphological case where 
we need T in syntax although case is not involved. Consider some Turkish 
data in this regard. 


(44) [Gelin-e — ben-im ` uyu-dui-um-u], [damad-a | Ahmet'in | calig-tif-i-ni] 
Bride-DAT I-AGR.1s sleep-COMP-1s-ACC groom-DAT A-AGR.3s work-COMP-3s-ACC 
sóyle-mig. 
tell-PERF 


lit. 'S/he told the bride that I am sleeping and the groom that Ahmet is working." 

The string ben-im uyu-dug-um-u must be type-raised (by T) and com- 
posed (by B) with gelin-e, so that we can account for the unorthodox con- 
stituency of Gelin-e ben-im uyu-dug-um-u in coordination. This is shown be- 
low. The second coordinand must do the same for its constituents. 


(45) gelin-e ben-im uyu -dugum -u 
bride-DAT I-1s sleep -COMP.1s -ACC 
NP dat Sis S'is\Sis (S\NProm\NP dat) / 


(S\NPpom\NP dat \NPacc)\S" 
(S\NP nom \NP dat) /(S\NPnom\NP dat \NPacc) 


(S\NProm)/ — 
(S\NP nom \NP dat) 


(S\NProm)/(S\NPnom\NPaat\NPace) 


>B 
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Leaving T to the lexical category of a case-marker such as the accusative 
case on the nominalized verb, as is done above for uyu, will not always work, 
because unmarked clauses must be type-raised as well in certain syntactic 
contexts: 


(46) Gelin-ce  ben-im ` uyu-dug-um, | damad-ga da Ahmet'in calig-tig-i 
Bride-ESS I-AGR.1s sleep-COMP-1s groom-ESS A-AGR.3s | work-COMP-3s 
bil-in-iyor. 
know-PASS-PROG 
lit. ‘It is known by the bride that I am sleeping and by the groom that Ahmet is working.’ 


Unless we lexicalize all Turkish subordinate clauses, which can be case- 
marked or unmarked nominalized clauses, T must be a lexical rule.?^ 

Another empirical reason for a schematized T is the word-internal recur- 
sion in nominals. The Turkish relativizer suffix -ki can be attached to case- 
marked nouns whose case relation is one of possession, time, or place (i.e., 
the genitive and the locative), for example ev-in-ki (house-GEN-ki ‘the one 
of the house") and ev-de-ki (house-LOC-ki 'the one in the house"). 

Its effect is to create a nominal stem on which all inflections can start 
again. As Hankamer (1989) noted, there is no upper bound on this process of 
relativization (e.g. ev-i-nde-ki-ler-in-ki-ler-de-ki). 

It follows that these words must be derived in syntax (otherwise we would 
have an infinite lexicon). They can take part in nontraditional constituencies 
such as those below, which is possible in CCG only if these words are type- 
raised and composed, therefore type raising must be a rule. The critical step 
is shown in (47b). 


(47) a. [Ev-de-ki-nin-ki adam-a], |salon-da-ki cocug-a| sari-mig 
house-LOC-ki-GEN-ki man-DAT room-LOC-ki child-DAThug-PERF 
lit. ‘The one in the house's one hugged the man, and the one in the 


room the child.’ 
e.g. ‘The friend/acquaintance of the one in the house hugged the 


man, and the one in the room the child.’ 
b. Evdekininki adama 


NP ` (SNNP)/(SNNPNNPaa) 
——— T 
S/(S\NP) 
S/(S\NP\NPaat) 
Thus the only theoretical possibility to maintain near context-freeness 


of CCG and to have a BTS system, given our current understanding, is to 
finitely schematize the unary T as a universal lexical rule. Every language has 
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a finite vocabulary of argument categories, therefore it seems to be a feasible 
solution. Since by this choice we keep B and T in syntax, C can be lexical. 
Because we do not keep W or C in syntax, S can be syntactic. 


We can now have a look at variable-friendly semantics in relation to 
BTS syntax. 


Chapter 6 
The LF debate 


This chapter is about apparently the least adjacency-related and most post- 
PADS related aspect of combinatory theorizing: the issue of having a Log- 
ical Form (LF) for narrowing down the possible interpretations, without a 
concomitant narrowing of possible constituents.?? The issue is also the most 
divisive and perplexing. 

The reader is referred to better summaries and historical accounts such as 
Szabolcsi (1989, 1992, 2003), Jacobson (1999, 2002), Barker and Jacobson 
(2007), Steedman (19962, 2011). I will reiterate their way of handling some 
referential-interpretive phenomena, along with some assessment and predic- 
tions. 

Empirical concerns about constituency force the CCG variants to con- 
verge on a BTS syntax, where T must be constrained by the lexicon, either 
by type raising all the argument types in the lexicon, or by operating the unary 
rule under a limited domain and range, which can be compiled from the lex- 
icon. Adding unary BCWZ to this base where B, C and Ware constrained 
by the lexicon (for example apply unary B to objects only, unary Cto two- 
or more-complement verbs, unary W to reflexives, and unary Z, viz. BSC, to 
pronouns), is where CCG models begin to differ. 

The BTS system alone is variable-free syntax that makes use of bound 
variables in epitheorems only (in Curry's sense; see the discussion in 
page 31), related to the predicate-argument dependency structures (PADS). 
These are the systems with a logical form, i.e. they employ a lexical use of 
unknowns rather than variables. BCWZ systems on the other hand amounts 
to variable-free semantics, in addition to variable-free syntax. Binding of 
anaphors is handled by combinators as well, such as Jacobson's Z and a spe- 
cial unary B, and Szabolcsi’s W in the lexicon, which eschews Bach-style 
wrap, which has no combinatory counterpart. 

Jacobson (1999), Steedman (1996a, 2000b, 2011), Szabolcsi (1989) sum- 
marize what is at stake for each path. Szabolcsi's and Jacobson's arguments 
are both methodological, to culminate variable-free syntax with variable-free 
semantics, and empirical, for example whether we distinguish John left and 
He left syntactically, the first one as a sentence whose denotation is a proposi- 
tion, and the second as a function from an individual to a proposition. Steed- 
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man's argument is from automata-theoretic concerns, to reduce the amount 
of nondeterminism engendered by unary rules and eliminating additional re- 
source management needs such as a quantifier store, and also from cognitive 
science. He contrasts syntax-specific command relations which seem to defy 
traditional concepts such as c-command (e.g. an argument can be relativized 
independent of its c-commanding position) with the bound-element behavior, 
which seems to faithfully maintain such relations (e.g. reflexives and recip- 
rocals), suggesting a branching evolutionary pathway at work. Recall Steed- 
man's argument that reference avoids combinators and depends on logical 
form, which he suggests might arise out of pressures for speedy processing. "7 

There is another perspective that seems to call for a closer look at the prob- 
lem of LF. The syntactic dependencies engendered by syntactic processes are 
strict about the crossing or nesting kind (1a-b). But the semantic dependen- 
cies manifested by quantifiers and pronouns can cross and nest (1c-d). 


(1) a. A violin; which this sonata; is easy to play; on; 
b. *A sonata; which this violin; is easy to play; on; 
c. Every man; thinks that every boy; said that his; mother loves his; 


dog. (Jacobson 1999) 
d. Every man; thinks that every boy; said that his; mother loves his; 
dog. 


The lexical predicate-argument structure and the semantic dependencies 
it represents, the PADS, must be distinguished from the notion of LF. The 
linguistic notion of LF is borrowed from logic, where it meant, through the 
works of Frege, Carnap, Russell, early Wittgenstein, Tarski, culminating in 
Montague (1974), a pristine form of logical aspects of a sentence cleared off 
the surface characteristics such as inflection, agreement, word order, etc. 

Chomsky's (1976) and May's (1977, 1985) LF is a structural domain at 
which not-so-pristine issues such as quantifier movement and semantic re- 
analysis are handled, to the extent of having a separate syntax such as in Pe- 
setsky (1985, 1995). In logician's case, nothing intervenes to provide a model 
theory for LF (except some model-stage semantic storage and reinterpretive 
operations) because scope and predicate locations are all in place, whereas in 
transformational linguist's case conditions must be predicated over LF to get 
them, and more significantly, we need covert operations of different kinds to 
get the right LF. The closest analogue of such operations in Montague is the 
quantifying-in rule, which introduces a prosodic variable to be substituted by 
a logical formula. 
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In this sense Chomsky’s (1981) binding conditions A, B, C in (2) can be 
looked at from two angles: (a) As theory-internal constraints at some level of 
representation, such as LF as an interface, or, as in earlier transformational 
accounts, as a constraint on the input and output of transformations, (b) as 
desiderata for any theory to account for the syntactic narrowing of reference. 
They are roughly reformulated below to avoid theory-specific terminology: 


(2) Condition A: An anaphor (reflexive or reciprocal) must be bound in a 
minimal tensed domain. 
Condition B: A pronoun must be free where an anaphor must be bound. 
Condition C: A referring expression must be free everywhere. 


We have seen options (a-b) implemented in CCG various ways: (i) the 
adoption of LF as a level, without a model-stage extra storage or reinterpre- 
tation, with conditions such as LF-command but without any special syntax 
associated with it. This is Steedman’s (2011) surface compositionality, which 
means every surface constituent is interpretable, with any unresolved refer- 
ence in it bound either by tandem deterministic LF operations in the course 
of a derivation, or left to discourse. (ii) The LF-less narrowing of syntactic 
types in the lexicon by a lexical use of unary combinators (Szabolcsi 1992). 
(iii) The traditional Montagovian LF-less model with unary rules and lexi- 
cal types for initiating, projecting and binding of bound pronominals, leading 
to Jacobson's (1999) direct compositionality (*direct" in the sense that every 
semantic object that is compositionally derived is model-ready). 

As the brief descriptions suggest, the proposals conceive different ways 
to narrow down possible categories. Let us look at each alternative in some 
detail. 


1. Steedman's LF 


Steedman (1996b) defines LF-command as a substantive constraint on possi- 
ble categories, which is predicated over the LF. It is in this sense that LF is 
the only structural level of representation in Steedman's CCG, all other con- 
straints for example on syntactic types and derivational structures are com- 
pletely eliminated by radical lexicalization. I provide a newer formulation of 
LF-command from Steedman and Baldridge (2011). 
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(3) A node a in a logical form A LF-commands a node  (LF-command) 
P in A if the node immediately dominating œ domi- 
nates B and o does not dominate f. 


The LF unknowns are of the kind ana'x, pro'x, which are nonbranching 
pro-terms where x is identical to some element in the LF. In other words, 
and'kinski'kinski' is (4a) rather than (4b). 


(4) a. fucum b. P" 
ana'kinski! ` kinski’ kinski! 


and! kinski! 
His binding theory reduces to one condition, which is similar to Condition C. 


(5) No node except the argument in a pro-term can be LF-commanded by 
itself. Steedman and Baldridge (2011) (Condition C) 


This condition eliminates (6a—b) as possible interpretations of otherwise 
grammatical examples. Condition A and Condition B are explained away by 
noting that reflexivization is lexicalized (i.e. it requires the lexical category 
of a verb), and pronominal binding (of x in pro’x) is not lexicalized. 


(6) a. She,; liked Milena: 
b. Ix; think she, liked Milenaj;. 
c. Milena; liked her,,,/herself. 


Thus herself in an example such as (6c) would have access to all the argu- 
ments in the LF of Ax1Ax».like'xix», which means it can only substitute for 
xı. If herself has the semantics APAx.P(ana'x)x, then we get LFs of the sort 
in (7) once it combines with the verb. 


(7) we 
milena!’ 


like’ ana'milena' 

The analysis of her in (6c) is the main source of variation in Steedman’s 
CCG. Although there seems to be a recent consensus that condition B effects 
should be left to a discourse model (Jacobson 2007, Steedman 2011), there is 
some work done in LF in Steedman’s case to eliminate proliferating readings 
in examples such as below. He avoids semantically powerful yet syntactically 
innocuous operations such as an extra stack for scope-taking or the semantics- 
only type-change, which could in principle dispense with LF for handling this 
kind of work. 
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(8) a. Every farmer who owns a donkey; feeds it;. 
b. All the girls admired, but most boys detested, one of the saxophon- 
ists. Geach (1972) 


Steedman's (2011) suggestion is that, unlike the deletion accounts of 
transformationalism, which deliver too many readings for examples like (8b), 
and unlike strict Montagovianism, which would require extra devices on the 
semantic side for (8a—b), assuming an LF may give us surface-compositional 
readings only, with concomitant syntactic assumptions such as the type- 
raising of all arguments but generalized quantification of only the universal 
quantifiers. 

This is where his LF assumption begins to do more work than reflexiviza- 
tion and nonsubject pronominal binding. His Skolem terms, which are LF 
terms in need of a scoping universal quantifier, gets the scope information 
and the terms of skolemization from LF-command. 

Although Steedman's introduction of Skolem terms in place of nonuniver- 
sal NPs gives us only the possible readings in (8), example (9a) is susceptible 
to his LF-term binding although there is no Skolem term, hence we need 
Condition B effects to rule it out. And, (9b)'s Skolem-term is not sufficient to 
eliminate binding in LF to it. We need to call in yet again condition B effects 
of discourse to the rescue. 


(9) a. Every donkey; feeds it,;. 
b. A donkey; feeds itxi. 


Thus Skolem terms and their tight management during the syntactic pro- 
cess sometimes need discourse conditions anyway, to find their antecedents. 
This is true of “donkey anaphora” as well. Consider (8a) in a context where a 
donkey named Balthazar is left to the common goodwill of the village, which 
gives us a free interpretation of it. 

We have yet to find cases where a quantifier-bindable pronoun can only 
have that reading. That would vindicate an exclusively grammatical solution 
to pronoun resolution in at least some constructions. We also have examples 
like (10), where an antecedent within a quantified NP not c-commanding (or 
LF-commanding) the pronoun is possible. 


(10) Every professor's; neighbor respects heri. (Postal and Ross 
2009:ex.66) 


If this were the only reading, it would jeopardize a Skolem-binding solution 
of bound anaphora over an LF structure, because the potential antecedents 
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of quantifier-bound pronouns are read off in the theory as the list of LF- 
commanding terms. 

As it currently stands, Steedman's LF-Skolem-command account must 
leave both bound and free interpretations of (10) to discourse.’ 


2. Szabolcsi's reflexives 


One useful consequence of assuming an LF-command and pro-terms is that 
we can universally rule out subject reflexives such as *sheself without an 
appeal to W or Z, thus without having to stipulate this constraint in ev- 
ery lexicalized grammar. The LF pred'(ana'x)x satisfies Condition C, but 
pred'x(ana'x), which would be engendered by the LF of *sheself, does not: 
(11) 
; ana'x 
pred! ana!x pred x 
Szabolcsi's (1992) combinatory solution below to the same problem is LF- 
less therefore without c-command or its LF equivalent. Her claim is that the 
binding theory of (2) follows from combinatory assumptions about syntax- 
semantics, including the lexical assumptions about the predicate-argument 
structures. The relevant combinatory options are the lexical use of W and B. 


(12) a. sheself := *S/(S\NP3s) 
b. herself := (S\NP)/((S\NP)/NP): Af Ax.fxx 


Example (12a) is an illicit type because the explicit involvement of W for 
reflexives presumes that we have a function with two or more arguments in 
the predicate-argument structure to begin with, which is inconsistent with 
this syntactic type. Assuming that subjects are universally type-raised, like 
all arguments, the impossibility follows without further conditions.?? That 
explains (13a) but not (13b), as Szabolcsi pointed out. 


(13) a. Sheself left. Szabolcsi (1992) 
b. *Sheself sees everyone. 

The second example would require the category (S/NP)/((SNNP)/NP) for 
the nonsubject argument if it were grammatical. This would be different 
than (12b), as expected, but (12a) would allow it if we let unary B loose in 
syntax (divide 12a by */NP"), which is eliminated for independent reasons. 
This takes care of Condition A without an LF, as a consequence of the syntax 
and semantics of B and W. 
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A further condition is imposed on the lexicon: reflexives must apply to lex- 
ical items only, otherwise (14a) would be allowed. Lexicalization is needed 
because (14b) must be derivable, which shows that there are syntactically 
derived (S\NP)/NP types. 


(14) a. *Mary believes that John loves herself. Szabolcsi (1992) 
b. Who does Mary believe that John loves? 
(S\NP)/S' S' JS S/(S\NP35) (Sin NNPss)/N. 
>B >B 
(S\NP)/S S/NP 
(S\NP)/NP 

I write the lexical constraint (the +LEX feature of the slash in Steedman and 
Baldridge 2011), as "E or ‘\’, with the interpretation that an item e.g. @ := 


A\ B requires a leftward type B to be lexical to yield A (likewise "E for the 
rightward variety): 


>B 


(15) B must be the type of a lexical item (the LEX convention) 
in A\B and A fB. 


Now the string believes that John loves bear the -LEX value, which accounts 
for (14) because herself bears the "1 (+LEX) constraint. 

With or without LF, some right-node raising examples are forced to an el- 
lipsis analysis under the lexicalization of reflexives. The coordinate structure 
below does not bear a lexical type. 


(16) Kinski adored and Wittgenstein hated himself. 


The LF proposal is forced to a semantic “wrap” (i.e. C) analysis in En- 
glish ditransitives, and for VSO languages. I repeat Steedman and Baldridge's 
treatment of reflexives below to elaborate. 


(17 . Mary saw herself. 
S/(SNNPs,) (SNNP)/NP. (S\NP3s)\¢((S\NP3s)/NP) 
: Axdy.see'xy : APAy.P(ana'y)y 
S\NP3s: Ay.see' (ana! y)y 
The innermost lambda abstraction of three or more arguments is unavailable 
to the reflexive with its APA y.P(ana'y)y semantics. We must schematize the 
types of herself to get the right semantics for these cases, which is nontrivial 


because it involves semantic wrap to get x in between ana’y and y below. This 
is harmless computationally because it is done in the lexicon. 


« 
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(18) Mary gave herself a present. 
(SNNP)/NP/NP (RN, NEI, @((S\NP3s)/NP/NP) 
: AxA yÀz.give'xyz : APAxAy.P(anda y)xy 


(SNNP3,) / NP: Arie give (ana! y)xy 
We are similarly forced to an analysis involving semantic wrap in VSO 
languages (19a). (For brevity, NP! represents a type-raised NP.) First, notice 
that the -LEX constraint applies to Welsh reflexives as well, although they 
are not string-adjacent to the verb like in English. Note also the knowledge 
of LF, where x(ana'x) rather than (ana'x)x is assumed for Welsh, because of 
VSO verbs, and also because of (19b).°? 


(19) a. Gwelodd Wyn d ei hun 
Saw Wyn imself 
S/NP/NP35 EE SV(S/NP/NPs,)NNP! 
: Ax, Ax .see'xrX| : : APAQ.P(Ax.Qx(ana'x)) 


ae ÀQ.Qw' (ana'w’) 
S:see’(ana'w' )w' 
“Wyn saw himself.’ Awbery (1976: 131) 


b. *Gwelodd ef ei hun Wyn 
Saw himself Wyn 


< 


The LF-less W semantics and lexical syntactic types for reflexives general- 
ize nicely to APA Q.P(Ax.Qxx), as shown below, as an alternative to the LF 
account in (19a). 


(20) Gwelodd Wyn d ei hun 
Saw Wyn imself 
S/NP/NP35 PIREN pee! ee 
Ax Ax .see'xnX| d Q.P(Ax.Qxx) 
eS A0.0w'w' 
S:see'w'w" < 


3. Jacobson’s pronouns 


Jacobson’s starting point is that syntactic elements that seem like vari- 
ables, for example pronouns, do not necessitate variables in syntax or 
semantics. Working with combinatory-syntactic assumptions, she avoids 
transformational-style variables from the beginning (the empty categories), 
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and suggests a binding scenario which takes place within the semantics of 
a specialized unary Z (specialized to apply to e-type NPs only, hence prop- 
erly equipped to bind the right kind of pronouns-as-variables). With the help 
of a specialized unary B called g (for ‘Geach’), this move avoids the use of 
LF to account for the bound and free interpretations of pronouns. This way 
pronouns-as-arguments are forced to yield functions rather than propositions, 
therefore they make a finer distinction in possible syntactic types and bear 
empirical consequences. 

Her narrowing of the possibilities in the grammar-lexicon are roughly as 
follows. The reader is referred to Jacobson (1999) for full exposure, and to 
Barker and Pryor (2010) for a computational model using monads (i.e. thread- 
ing of g-computations with z-computations). 

Pronouns are lexically (e,e)-types in her theory, which she translates syn- 
tactically as NPNP Syntactically this is the collection of all functions from 
NP types to NP types. I will call them exponent types for easier reference. It 
is conceived as a semantic narrowing of an NP with syntactic significance, 
because of the distinction from another collection of functions from NPs to 
NPs: NP|NP. 

The exponent types must be mediated in syntax to force an individual-to- 
proposition functional readings of (21a-b), rather than the propositional ones 
in (21c-d), because the verbs lexically do not know the distinction. This is a 
compelling argument for the syntactic narrowing of type S. 


(21) a. He left. (SNP) 
b. Kafka adored her. (s. P ) 
c. John left. (S) 
d. Kafka adored Milena. (S) 


This Jacobson achieves with a specialized unary B, where Z=NP ; cf. the 
syntactically freer one in $4(28). 


Q2) X|Y: f + X7|Y7 : AgAx.f (gx) (g-Z) 


Since this is not syntactic B, the slash can bear any modality, not just "A. 
or ‘/’. We shall see later that this is further corroborated by the data; (38b) 
needs to apply this rule when the slash is ‘\,’. 

Now we can derive (21b) as a function from individuals to propositions, 
syntactically SNP This is different than deriving it as S[NP with the freer 
version of unary B because, syntactically speaking, the expression needs no 
arguments. 
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(23) Kafka adored her. 
S/(S\NP3s) (SNNP)/NP  NPNP 


SNP 


(S\NP)NP /NPNP 
(S\NP)NP 


-g-NP 


SNP /(S\NP3.)NP 


sNP 


Jacobson’s way of handling the pronouns therefore needs no lexical dis- 
tinction between a contextually bound but syntactically free use of a pronoun, 
and a syntactically-bound pronoun. They both derive functions rather than 
propositions. I show the semantics to make this point explicit. Notice that the 
variable z below is not a syntactic argument because the syntactic type is not 
S|NP. 


(24) He left. 
NPNP : Axx S\NP: Ay.leavely 


SNP 


SNP\ NPNP : 4 £4 leave (fz) 
SNP : 4 leave'z 
The bound pronoun below is where her unary Z does its binding. This 


combinator is specialized in Jacobson's case to apply to NPs only; cf. the 
freer version $4(54). 


Q5) (X|; NP)|; Y: f > (X|; NP)|; YNP : AgAx.f(gx)x (z-NP) 
(26) John loves his mother. 
S/(SNNPs,) | (SANP3,)/NP NPNP 
LAF fj! "Ad Joel : Ax3.the-mother-of'’x3 
(S\NP35)/NPNP 
: AgAx.love'(gx)x 
SNP, 7 


: Ax.love' (the-mother-of x)x 


> 


S: love! (the-mother-of j')j' 
Notice that the result is a proposition, not a function. (I eschew as Jacobson 
does the analysis of English genitives.) If John loves somebody else’s mother, 


Jacobson's pronouns 97 


then we would get the function SNP as expected. I leave the mechanism and 
its implications for binding to much detailed discussion in Steedman (2011), 
Jacobson (1999). 

The unary Z assumption carries with it some complications for VSO lan- 
guages. For example, Welsh bound anaphora (27) might need syntactic wrap 
to apply (25) to the right argument, to the verbis category S/NP3,/wNP to 
get S/NPs,/ wNPMP where the slash subscript ‘W’ denotes wrap. 


(27) Mi newidith Sión ei feddwl. 
PRT change.FUT.3s Sion 3MS mind.INF 
‘Siôn will change his mind.’ Welsh; Borsley, Tallerman and Willis 
(2007: 52) 


Alternatively, we can consider another version of (25), viz. (28).® Its work is 
shown in (29) for the bound-pronoun interpretation. 


Q8) (X|jNP)|; Y: f — (X\ YNE) NP: AxAg.fx(gx) (2'-NP) 


(29) Mi newidith Sión ei feddwl 
PRT change.FUT.3s Sion 3MS mind.INF 


S/NP/NPs; NP; NpNP 
: Ax, Axz.change'x2x,; "d : Àz.the-mind-of'z 


z-NP 


S/NPNP /NP 
: AxAg.change' (gx)x 


S 

: change’ (the-mind-of"s")s' 

*Sión will change his mind.’ 
We are forced to get a free reading of ‘his mind’ from the individual-to- 
proposition interpretation of *Sión will change'. Its analysis is shown in (30). 
This string cannot be made a VP in any movementless theory— but it is in- 
deed interpretable in CCG without extra devices, and it seems to suffice that 
it be a function so that individuals can take it as an argument to yield a propo- 
sition, via the $\(S/NP) type, or as a function to yield another function, via 


the SNP (s Np)NP type. 
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(30) Mi newidith Sión ei feddwl 
PRT change.FUT.3s Sion 3MS mind.INF 


S/NP/NP3, NPs NpNP 
DAxiAxo.change'xoxi "e : ÀAx.the-mind-of'x 
S/NP 
: Ax.change'x2s' 
: AgAx.change' (gx)s 
sNP 
: Ax.change' (the-mind-of'x)s 
We get the binding conditions that an anaphor inside the subject cannot be 
bound by object for free, if we assume the type-dependent solution to pronoun 
binding, rather than the structure-dependent solution of the familiar Chom- 
skian kind, or Steedman-style LF as the level for binding. Given the NpNP 


assumption for a pronoun, we cannot get a proposition (S) reading for the 
following example; it must at best be a function from things to propositions: 


SNP 


(31)*Prynodd ei awdur ei hun d llfyr. 

buy.PAST.3s 3MS author 3MS self e book 

S/NP/NP3s NpNP S\(S/NP) 
a 2-NP 
(S/NP)NP jp NP 

(S/NP)NP 
-g-NP 
sNP 


* ‘Its own author bought the book.’ Borsley, Tallerman and Willis (2007: 132) 


In summary, the lexical type of a pronoun initiates, g projects, and z closes 
off the referential dependency of the bound pronoun, as in monadic com- 
putation. The process is an instance of threading the computation as z(g), 
as Barker and Pryor (2010) showed. This is not the only monadic aspect of 
CCG, as we shall see in Chapter 10. 

It seems possible, then, to find a purely type-dependent way to maintain 
Chomsky's binding conditions as desiderata to narrow down the syntactic 
types, rather than add some conditions on a structured domain like LF. There- 
fore Steedman's (2011) introduction of structure-dependence on the LF side, 
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on top of type-dependence in syntax-semantics correspondence, can be in- 
terpreted as a plea for computational parsimony in parsing, competence and 
its evolution, i.e. as a computational (read: empirical) challenge to cognitive 
science. 

The counter-balance of the challenge is a long list of predictions we get 
from exponent types. For example: (a) syntactically differentiating the truly 
contextual pronoun binding versus its capture of an antecedent in syntax, so 
that for example an oracle can be called in to work depending on parser’s 
output when the result is SNP rather than S. (b) The empirically discernible 
distinction we get about the meaning of John left versus he left, as pointed out 
by Jacobson (1996).°! (c) The prediction of resumptive pronouns as possible 
lexical items, because we can systematically relate nonextraction categories 
like (N\N)/ SNP to extraction categories (N\N)/(S/NP). Note that the g-Z 
rule or the z-NP rule does not apply, hence these must be lexically mediated, 
which befits resumptive pronouns. (d) Can syntax require a pronoun? Jacob- 
son’s NPNP type predicts that it may take part in the domain of locality of a 
construction. 

I have no knowledge of such a finding, but the Welsh cael “get” passive 
comes close: 


(32) Cafodd Wyn ei rybuddio. 
Got.3s Wyn his warning 
“Wyn was warned.’ Awbery (1976: 210) 


Awbery (1976: 47) explains: “The passive sentence has a sentence-initial in- 
flected form of cael (get) of the same tense and aspect as the verb of the 
active. This is followed by a noun phrase identical to the object of the active. 
Then comes a pronoun of the same person, number and gender (if it is 3sg) 
as this noun phrase, and an uninflected form of the verb in the active.” 

Awbery’s data shows that what is dropped if the noun phrase after cael 
is a pronoun is the subject NP, not the possessive pronoun required by the 
passive: 


(33) Cawsom (ni) ein rhybuddio gan y ferch. 
Got.1pl (we) our warning by the girl 
“We were warned by the girl." Awbery (1976: 48) 


Therefore the pronoun is obligatory, and it is syntactically bound. It can be in 
the domain of locality of the head cael.9? The NpNP type's relation to NP|NP 
is predictable too. For example, Turkish headless relatives (34a—b) are indeed 
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pronominal, as the semantics implicated in the glosses show. They are derived 
from (NP/NP)\(S\NP) of the relative participle which yields NP/NP for the 
relative clause, as in the headed variety (34c-d). (The examples are repeated 
from §2(11).) 
(34) a. [[[/Istanbul'a gid-en|yp /yp]-ler-i] pNP ben gör-me-di-m. 
Ist-DAT go-REL-PLU-ACC I  see-NEG-PAST-1s 
‘I did not see the ones that go to Istanbul.’ 
b. [[[/stanbul'a git-tik] wp yp l-ler-im] pNP daha giizel-di. 


Ist-DAT go-REL-PLU-POSS. 1s more beautiful 
“The ones with which I went to Istanbul looked better.’ 
c. [Istanbul’a gid-en|wp Np otobiis 
Jet DAT — go-REL bus 
“The bus that goes to Istanbul’ 
d. [Istanbul’a git-ti-im yp Np otobiis 
Jet DAT — go-REL.1s bus 
“The bus with which I went to Istanbul’ 


To recapitulate: employing the combinators for variable-free semantics 
does not seem to violate the transparent import of order-instigated seman- 
tics of combinators to their syntacticization. Doing without them forces us 
to make auxiliary assumptions. Moreover, some constituents seem to show 
asymmetric behavior regarding the exponent types. I exemplify some of them 
in the next section. These are new research agenda for the entire family of 
CCG models. 


4. Moreon LF: Unary BCWZ, constituency and coordination 


In an LF-less system, we not only need Jacobson's (1999) unary Z but unary 
B as well, to account for multiple pronouns and their binding possibilities. 
The first example below is obtained if the verb said undergoes unary Z first 
and then unary B, as Jacobson (1999) showed. We get the second example if 
the order is reversed. 


(35) a. Every man; thinks that every boy; said that his; mother loves his; 
dog. (Jacobson 1999) 
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b. Every mani thinks that every boy; said that his; mother loves his; 
dog. 


Recall the unary B's devastating effects on complete constituents, repeated 
here: 


(36) a. I think that Wittgenstein might have liked Kafka. 


VP/S' S'/Stin Sfin 
b. zf think Wittgenstein that might have liked Kafka 


SI, So NP 
(S'\NP)/(Stin\NP) 
S'\NP 


app 


D ` 
Jacobson's account avoids this problem by keeping the complete constituents 
complete albeit a unary B: the word that undergoes a type-shift to elef 
by (g-NP) to eliminate (36b). 

Likewise, Szabolcsi’s use of unary BCW avoids deriving a noncon- 
stituent, by building them into the lexical categories. Therefore a BTS binary 
core syntax seems uncontroversial, except for Shaumyan (1977, 1987)-style 
combinatory semantics where two expressions are related by combinators, for 
example that man I hate him and I hate that man by K. That seems to have 
a different agenda than a search for a radically lexicalized adjacency system 
for grammar. 

Thus the theoretical differences come down to the interpretation of some 
empirical issues, repeated below: (i) He lost in (37a) is considered S by 
variable-friendly semantics and SNP by variable-free, (ii) the asymmetry of 
binding in (37b—c) are attributed to LF conditions in variable-friendly systems 
and to lexical generalizations about arguments in variable-free, and (iii) the 
lack of respect to LF conditions in nonlocal constructions in (37d-e) and in 
relativization are handled by the conspiracy of lexical syntactic and semantic 
types in either view. (37f-g) are still divisive, as pointed out earlier. 


(37) a. Every man; thinks (that) he; lost and (that) Mary won. Jacobson 
(1999) 

. *Sheself left. Szabolcsi (1992) 

. *Sheself sees everyone. 

. A violin; which this sonata; is easy to play; oni 

. *A sonata; which this violin; is easy to play; on; 


onan c 
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f. Every man; thinks that every boy; said that his; mother loves his; 
dog. 

g. Every man; thinks that every boy; said that his; mother loves his; 
dog. 


The exponent vocabulary for syntactic and semantic types therefore creates 
not two incommensurate categorial landscape (that would be the case for 
wrap systems), but some degree of freedom. 

The treatment of (372a) bears on constituency in an indirect way, in the re- 
sult categories of coordination, precisely because opinion is divided about the 
category of He lost and about the nature of extraction in resumptive pronouns. 
Consider the examples below. 


(38) a. Every man; loves and no manj marries hisig ;jxijxj mother. 


b. Every man; thinks be: lost and Mary won. 
NPNP SNP AXX S 

EA \NPNP S\ S i 

NP sNP\ SNP 2d 

sNP E 


As Jacobson (1999) points out, the NpNP type for pronouns maintains (a) the 
across-the-board CSC asymmetry without extra assumption, that it is impos- 
sible to bind out of one conjunct in (38a), and possible to bind into just one 
in (38b), and (b) that the “like-category constraint" for CSC is not enough if 
we do not make the three-way (S, sN P S|NP} distinction. 

The derivation in (38b) maintains the “like category” explanation for co- 
ordination without extra assumption. It is not a violation of application-only 
modality of the coordinator and, because no new slashes are introduced by 
g-NP. We shall see in monadic computation (Chapter 10) that the slash in 
unary composition of (22) can indeed be without modality. 

Regarding the asymmetry in coordination in relation to pronominal ref- 
erence, we can look at the rightward conjuncts with functions rather than 
propositions. Interesting possibilities arise in a modalized CCG. Jacobson's 
suggestion of unary composition might appear to make coordinands suscep- 
tible to island violations, but it does not. We can maintain the islandhood of 
conjuncts by disallowing composition into them using the application-only 
modality. Jacobson's (1999) suggestion to type-raise the S of leftward con- 
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junct to S/(SNS) to derive (39a) avoids the composition of Mary won with 
and (39b). 

(39) Every man; thinks ` Mary won and he; lost 
S/(SNS) wax M 
&-NP $g-NP 


ef uge (X\ xj NP; NP 


(SS). 
sNP 
(a) 
* Every man; thinks he ` Mary won and would lose. 
S/(SNS) (X\,X) 4X S\NP 

SNP -g-NP -g-NP 

kkk >B 

(b) 


In summary: in exploiting the degrees of freedom afforded by expo- 
nent types of Jacobson, lexical generalizations of combinators and variable- 
friendly logical forms, we are within the program of radical lexicalization. 
The unary combinatory rules have substantive constraints on them, or they 
are built into the lexical categories. In other words, they are lexical rules. No 
combination rule or lexical rule depends on LF in systems where it is posited 
as a level. The empirical coverage of constituency is the same, although some 
empirical assumptions, theoretical choices and predictions differ. 

Variable-free semantics spells a tightly controlled unary system with an 
interlocking choice of constraints on for example pronouns, different kinds 
of verb classes, reflexives, relative pronouns, object categories etc. Its highly 
nondeterministic type-shifting rules seem to add no more burden than the 
result that type-raising must operate as a universal rule anyway; it cannot 
be fully lexicalized. Its use of model-stage storage to take care of quanti- 
fier scope as done by Cooper (1983) does require another stack, but, as long 
as that stack does not interact with the parser’s category stack, having two 
stacks does not automatically give us Turing-completeness or a more liberal 
computation. On the other side, variable-friendly semantics of the LF kind 
is forced to posit a model theory over and above what the standard logics 
provide, such as for example in Steedman (2011). 
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In both cases, surface compositionality is maintained, for a good reason. It 
appears that the logician's logical form is the cognitive scientist's and compu- 
tational linguist's predicate-argument structure and dependencies. The notion 
of LF is entirely uncontroversial in computational linguistics, to the extent 
that it is almost always implicitly assumed, because otherwise the task of 
using a grammar in both ways to parse and generate is unreasonably compli- 
cated. 

This LF is in most cases not Chomsky's or May's LF, because no pred- 
ication over such a level is bothered to be checked in the first place. Noise 
in the data (ambiguity, vagueness, misunderstanding, misperception, miscon- 
ception, misattention, misaction etc.) far outweighs the noise that might be 
introduced by not checking the LF conditions on the hypotheses. 

Cognitive scientists with a computational bend use LF as an approxima- 
tion of PADS in learning syntactic categories from PF-PADS pairs where the 
category is the hidden variable (after all, it is not observable). To go from 
models to PADS in that task is complex, and the search space for the hidden 
variable is much less constrained. 

Recall also that the Condition A-like innate knowledge, that children 
never entertain the possibility of e.g. *sheself, can be subsumed by a con- 
spiracy of universal constraints on the lexicon: (a) that all arguments are 
type-raised, (b) argument-taking is combinatory knowledge (e.g. knowledge 
of W dependency presumes knowledge of curried transitivity, which also 
brings in coargumenthood without further assumptions), (c) lexicalizable 
variables—pronouns—are not semantic variables but unknowns. 

A linguistic representation of semantics can be an uncontroversial as- 
sumption, independent of whether we posit a Steedman-style LF without ex- 
tra syntax, a Pesetsky (1985)-style LF with its own syntax, or a Montague- 
style derivation structure where some scope bookkeeping is sufficient for a 
model-theoretic interpretation. 

This LF is linguistically interesting to the extent it represents or models 
asymmetries, such as scope and binding. There is no language with a sub- 
ject reflexive.® Logically it seems perfectly possible, as say (Vx) (x=mary’ > 
see'xx), which would be a legitimate logical representation for Mary saw her- 
self, as well as for *sheself saw Mary, and *Herself saw Mary. 

Some striking counterexamples to this long-standing observation have 
been shown by Postal and Ross (2009). English, Albanian and Greek inverse 
reflexives, which are the least oblique (subject) reflexives with clausemate 
antecedents, strengthen the need for a linguistic representation because they 
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require, according to Postal and Ross, the notion of derived subject, a strictly 
linguistic concept, as in Relational Grammar (see Blake 1990 for RG con- 
cepts). 

Consider another case for a linguistic representation. The Turkish plural 
marker must be considered polysemous if we want to eschew an LF represen- 
tation. We have (40a), in addition to the nonlocative extensional interpretation 
of the plural (40b). 


(40) a. Yarin akşama | Ahmet'lere | davetliyim. 
Tomorrow night-DAT A-PLU-DAT invited-1s 
‘Tam invited to Ahmet's for tomorrow night.’ Turkish 
b. Kendini kitaplara verdi. 
self-ACC book-PLU-DAT give-PAST 
‘S/he gave himself/herself to the books.’ 


The expression in (40a) is three-way ambiguous: (1) There may be more 
than one people at Ahmet's, with Ahmet being the representative of the group, 
(2) there might be only Ahmet at Ahmet's, or (3) there might be somebody 
else, or even no one, at Ahmet's. In the last case the speaker would know 
the place as Ahmet's, just as s/he would know Mehmet's, Ayse's, Mary's 
as places, thanks to the plural. The first reading is closest to an extensional 
interpretation of the plural, but the other two are intensional. That kind of 
polysemy-turned-ambiguity might render the idea of radical lexicalization 
vacuous, because any marker can be intensional or extensional in this regard: 


(41) a. dünyamın tepesi 
world-GEN top-POSS 
*the top of the world Turkish 
b. adamın arabası 
man-GEN car-POSS 
‘the man’s car’ 


A Montague-style intensional logic (IL) has room to work from a type say 
plu,’ but the core translation of Montague's IL is disambiguated, therefore 
we would need two types or two rules to intensionalize and extensionalize 
the plural. A PADS presentation could have one entry to be mapped to Mon- 
tague's intensional-extensional world. Partee and Rooth (1983) show how 
type-shifting can relate one grammatical object with many model-theoretic 
objects. 
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In regard to the combinatory syntactic knowledge of plurality, there is no 
distinction between the intensional and extensional interpretation, hence we 
would expect a single category. As a knowledge of the full interpretability of 
a meaning-bearing element, we can conceive a two-way IL translation both 
of which are disambiguated, or use Partee and Rooth idea to define a function 
from one PADS object to a powerset of a finite set of types, which would also 
secure a lexical representation along with PADS. This does not directly relate 
to meanings out there but to model-theoretic constraints on PADS objects 
like plu,’ hence it can be considered part of competence because it is linked 
to PADS, which is an essential part of a category. The noncommittal view of 
PADS toward truth conditions is also defended on the following grounds.9? 


Language embodies no particular metaphysics; it embraces both Realism and 
Psychologism. However, psychology has the last word. Whatever the seman- 
tics of a term, its relation to the world depends on human cognitive capacity. 
A word with a Realist semantics would only be coined or maintained in use 
by virtue of its associated mental schema. Likewise, whatever the semantics 
of a term, it is not mentally represented in isolation. Johnson-Laird 
(1983: 204) 


The narrow research program pursued here is that, whatever the nature of rep- 
resentation of semantics is, it must relate to syntax compositionally, because 
itis one end of the syntactic process. Whether it spells a truth-conditional se- 
mantics or some kind of mental and social world of thoughts and concepts is 
implicated here to be an interface issue; see Chapter 9, in particular $9.3 and 
$9.10, for further discussion. The topic is an open debate in cognitive science; 
witness a recent target article of Feldman (2010) and subsequent discussion 
in the same volume, with responses and criticism by Allen, Partee, Steels and 
Steedman. 


Chapter 7 
Further constraints on possible grammars 


A CCG grammar is a finite set of lexicalized category assignments to strings. 
The language of the grammar is its closure on the invariants listed in Ta- 
ble 2. Thus everything projects from the lexicon, because the invariants do 
not encode any language-specific information. It follows that all substantive 
constraints must be enforced on the lexicalized syntactic types, because the 
syntactic process is completely syntactic type-driven. 

A lexical category must therefore capture all the syntactic and semantic 
dependencies as knowledge of that string, say a word, since no other knowl- 
edge can be added during the syntactic process, and none deleted. 

Steedman offers the following principle as a constraint on possible cate- 
gories. 


(1) The Principle of Categorial Type Transparency: (PCTT) 


For a given language, the semantic type of the interpretation together 
with a number of language-specific directional parameter settings 
uniquely determines the syntactic category of a category. Steedman 
(2000b: 36) 
The principle works both ways (Steedman calls syntax-to-semantics map- 
ping the inverse of (1)). The semantic type of an interpretation is entirely 
determined by the syntactic type: 


(2) Take T to be the type relation with an inverse. If @ has the syntactic 
type A and p type B, then T(o, B) = TB | Ta = B|A, for some ‘|’. 
If (at, B) has a basic type A, then T(a, B) = A. Inversely, T! (B|A) = 
(T~'A,T~'B) = (a, B), for Auen and Bg). T ^! (A) = @ for a basic type 
Are, 


For example, assume the following types for English. 


(3) S:t Kafka died. 
S: (e,t) Kafka adored 
NP:e Kafka 
N: (e,t) man 


Given these types, S\NP can be (e,t) (functions onto propositions), or 
(e, (e,t)) (functions onto functions, where for example the result function 
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wants a discourse participant). We need not eliminate the second variety from 
theory (perhaps we cannot), when experience can sort it out. The S/NP can 
be (e, (e,t)) (functions onto predicates), or (e,t) (functions onto propositions, 
where for example the subject of the action is implicit, say self’). In the last 
case, we can safely assume that the implicit participant is not the syntactic 
object, because English subjects are not compatible with ‘/NP’s, therefore 
that NP must be the object. 

The principle suggests that, given these English-specific pairs, a category 
such as S/(SNNP) cannot be anything other than ((e,t),t) if S is t, and a 
category such as SN(S/NP) can only be ((e, (e,t)), (e,t)) if S is (e,t). 

PCTT is a relation, not a function with an inverse. For example, it is en- 
tirely possible that nominals get two categories in a language, say NP: e 
(proper names), and NP: (e,t) (properties). Then SNNP's semantic type can 
be (e,t) or ((e,t),t). What it does not allow is this: if X is of type & and Y D, 
then X|Y cannot be anything other than (D, a). Given a lexical pair of types, 
they are functionally dependent on each other. 

Take for example N: Ax.man'x and S\NP: Ax.sleep'x. The x of man’ is 
not a syntactic variable. We can deduce this property from the semantic type 
of man,’ which is (e,t). The x of sleep’ must be a syntactic variable, which 
corresponds to the ‘\NP’ of SNNP. Thus lambdas are not nominally desig- 
nated as syntactic or semantic. These properties follow from their lexicalized 
syntax translated from dependency semantics via adjacency. N cannot have 
a syntactic argument glued (by ‘:’) to a semantic object. S\NP cannot take 
place in syntax without a syntactic argument glued to its participant role. 

Jacobson's (1999) pronouns, and proposition versus function distinction 
of S can be covered by PCTT as well. Assuming (e,e) for NpNP as she 
does, we are forced to an (e,f) interpretation of s,NP where the e is not a 
syntactic argument, because the syntactic type is not S|NP. Since PCTT is 
not a function, we are not forced to assume that an S is always t type (that 
possibility would rule out a function interpretation of S, such as functions 
from individuals to propositions as in pronouns). It can be (e,t). 

The use of lambdas as the glue language of the ‘:’ relation in syntax- 
semantics correspondence therefore depends on the semantic types. Eta- 
normalization can eliminate variables from (e,t) types of various syntactic 
functions, e.g. from N: Ax.man'x and S\NP: Ax.sleep'x, which reveals the 
explicit role of the slash in syntactic argument-taking as a reflection of se- 
mantic argument-taking. The potential confusion about whether lambdas are 
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syntactic or semantic abstractions can be avoided if we use typed objects all 
the time, for example to claim that sleep’ is a one-argument syntactic func- 
tion which also happens to be a one-argument semantic function, and man’ is 
a zero-argument syntactic function which is a one-argument semantic func- 
tion. 

Using adjacency formulations of argument-taking over strings makes the 
distinction explicit. The Schónfinkel-Curry arity of man is man’, i.e. zero. 
The arity of sleep’ is 1, from B! Isleep'.96 

Thus the number of syntactic lambdas in the glue language is the power 
of B in a semantic object’s prefix. It is the same as the number of argument 
slashes in the syntactic type, and no confusion arises. 

With PCCT we can eliminate types such as (4) from the space of possible 
categories, hence possible grammars. 


(4) a. *sleep := S: Ax.sleep'x 
b. *sleep := (SNNP)/NP: Ax.sleep'x 


The first example says that all sleeping is syntactically memorized, be- 
cause it does not take any syntactic arguments, yet its semantics might suggest 
that (a) it does take a syntactic argument since it is a reflection of B'Isleep’, or 
(b) it is a function, in which case what it is a function of is not clear since the 
syntactic type is not S* for some X. If it is a property named sleep, as in sleep 
causes absenteeism, then it would be fine but inconsistent with other prop- 
erties, which are usually of type N or NP, but not S. Only cross-situational 
learning can remedy this problem, therefore the argument role/property inter- 
pretation must be considered legitimate. 

The second example (4b) does not claim that sleep’ cannot be a transitive 
verb. PCTT and its combinatory origin (Schónfinkel-Curry arity) simply say 
that if it is, then there must be another lambda, otherwise this category cannot 
be construed as the knowledge of the word. 

Thus the system is conditional on the current assumptions about the syn- 
tactic reflection of states of affairs, and needs no universal base such as in 
Jackendoff (1997) or Hopper and Thompson (1980) (the latter work assumes 
transitivity is universal). There can be a ditransitive sleep predicate as far as 
CCG is concerned, a fact which we must be able to discern from its syntactic 
behavior. 

The syntactic lambdas and the semantic ones can be eliminated by eta- 
reduction as we have seen. What cannot be eliminated are the structural un- 
knowns of the Logical Form (LF), if we follow the LF-friendly combinatory 
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path. In that sense, Steedman's unknowns are not the kind of objects that 
Schónfinkel's combinators are designed to eliminate. 

Steedman (2000b) offers two more substantive principles, the Principle 
of Lexical Head Government (PLHG), and the maxim of Head-Categorial 
Uniqueness (HCU). The first principle amounts to saying that lexical cate- 
gories must not proliferate just because there are many syntactic contexts in 
which a lexical item can take part, such as the word chews in the examples 
below, among others. 


(5) a. The cat chews the mat. 
b. The cat chews itself. 
c. the mat which I believe the cat chews 
d. The cat chews and the dog scratches the mat. 
e. This mat the cat chews all the time. 


By the same principle, the passive in the mat was chewed by the cat and 
the infinitive in the cat wants to chew the mat involve the same lexical item, 
namely chew. These principles do not reduce the space of possible categories, 
but they do put constraints on individual grammars, which makes the size of a 
grammar a meaningful number. McConville (2006) makes use of this number 
to choose among potential competence grammars. 

The principles we have covered so far bear on lexical correspondences, 
and they reduce the space of possible grammars because by the radical lex- 
icalization of the rule-to-rule hypothesis, a particular grammar can only be 
read off the lexical syntactic types. We shall see in $9.7 that the theory of 
functional categories employed in transformational grammar can also be seen 
as providing further constraints on possible syntactic types. The reason why it 
is considered a meta-theory for CCG is because functional categories do not 
seem to arise from combinatory dependencies, therefore not from a combina- 
tory manifestation of adjacency. For example, A P Pal can characterize both 
syntactic subjects and syntactic objects with semantics a.' Their differences 
in agreement and finite domains must arise from differences in the syntactic 
features of basic categories in a syntactic type. PCTT can only partially help 
in these matters, such as distinguishing S/(S\NP) and S\(S/NP), so that a 
theory of agreement or binding can make use of the distinction. 

Szabolcsi's (1989, 1992) constraints on the lexicon narrow down the pos- 
sible lexical categories, hence, by radical lexicalization, possible grammars. 

We can also think of other kinds of substantive constraints on possible 
grammars, some of which need not worry a grammar theorist. For example, 
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what could stop a group of people from acquiring a language in which every 
sentence ends with the same word? A linguistic theory would be overextend- 
ing itself in trying to address such matters when experience can sort it out. It 
might be in the Zipfian tail of possible languages. 


Chapter 8 
A BTSO system 


What can be the syntactic roles of the combinators other than BTSCWZ?I 
list the remaining set below, along with their equivalences: 


(1) Y Yx=y=xy for some y depending on x 


$ doc = x(yw)(zw) o — B(BS)B 
W  UWxyzw = x(yz)(yw) V — B(BW(BC))(BB(BB)) 
J Sxyzw = xy(xwz) J = B(BC)(W(BC(B(BBB)))) 


O Oxyz=x(Aw.y(zw)) O = C(BBB)B 


Recall that C = B(T(BBT))(BBT), and W = ST. Thus with the exception 
of Y, they must be lexicalized in a BTS system, according to Szabolcsi's cri- 
terion in $5(38). We have seen in $4.1 that Y is not finitely typeable, hence its 
finite representability cannot be assumed. Let us look at the finitely typeable 
ones. I leave out J because, as explained in $4.4, its behavior has not been 
observed in any language. 

Recall also that Szabolcsi's hypothesis is not sufficient to rule out K and | 
from syntax. It is a formal restriction. We needed empirical support to elimi- 
nate K and I. We also needed empirical support to suggest why B and S must 
operate binarily and not ternarily, which was also not covered by her hypoth- 
esis. These efforts can be considered as investigating the empirical import of 
Schónfinkel's fully binarized function-argument notation (currying), an oth- 
erwise formal result. 

Take for example ® and O. ®’s semantics is that of coordination. The for- 
mal criterion suggests that it is lexicalizable because ® = B(BS)B. Empiri- 
cally it is clear that coordination is lexicalized in languages, because there are 
languages which do not have syntactic coordination, for example Hixkaryana 
(Derbyshire 1979) and Dyirbal (Dixon 1972). And, every coordinating lan- 
guage seems to have a lexical head for it (and, but etc.), or restrict it to certain 
tunes. That is, there is always some syntactic object even if it is not a word 
to which we can assign the semantics of coordination in the lexicon, in the 
manner of Steedman (2000a). Therefore both formal and empirical results 
suggest that ® must be a lexicalized combinator. 

Not so for O. By the formal criterion (§5(38)), it can be lexicalized be- 
cause O — CB?B, and C is definable by B and T. Empirical facts suggest oth- 
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erwise. Recall that unlike other combinators, O is a combinator but not a su- 
percombinator. This is evident in its definition O f gh = f (Ax.g(hx)), with its 
unmovable inner lambda abstraction: x is not an argument of O. 

This combinator seems to be at odds with lexicalization when we consider 
that we are facing O semantics in strings such as what you can (2), which 
seems not to be lexicalized, for example what you can and what you should 
not do. 


(2) what you can 


S/(S/NP) S/(SNNP) (S\NP)/(S\NP) 
: AQ-?yQy : A f.f you : APAx.can' (Px) 
S/(S\NP) 
: AP.can'(Pyou') 


S/((SNNP)/NP) 
: AP?ycan' (P y you) 


Does this justify the incorporation of O into syntax? Recall the syntacticiza- 
tion of binarized O, which is at work in (2): 


(3) X/(Y/Z): f Y/W: g ^ X/(W/Z): Ah.f (Ax.g(hx)) (20) 


Hoyt and Baldridge (2008) provide the following examples from various 
languages which cannot be handled by a BTS system, a result which suggests 
free operation in syntax. They call such constructions cross-conjunct extrac- 
tion, first noted by Pickering and Barry (1993). All bracketed strings in these 
examples arise from syntactic and semantic assumptions similar to (2). 


(4) a. .. [What you can] and [what you must not| base your verdict on 

b. [dat ik haar wil] en [dat ik haar moet | helpen 

that I her want and that I her can help 

‘that I want to and that I can help her.’ Dutch 
c. [Wen kann ich] und [wen darf ich] noch wählen? 

who can I and who may I still choose 

“Whom can I and whom may I still choose?’ German 
d. Gandes-te [cui | ce] vrei, 

consider-IMP.2s-REF.2s who.dat what want.2s 

si [cui çe] poti, sa dai. 

and who.dat what can.2s to give. SUB.2s 

*Consider to whom you want and to whom you are able to give 

what’ Romanian 
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e. [Me lo puedes] y [me lo debes] explicar 
me it can.2s and me it must.2s explain 
*You can and should explain to me Spanish 


But, as they note, the same effect can be achieved by having multiple cate- 
gories for function words because these kinds of semantic dependencies are 
headed by them. The Turkish facts lead to the same conclusion: it is the rela- 
tive pronoun that seems to engender such kinds of constituencies. 
(5) a. Ben-im uyu-ma-digi-ni [savun-dugum | ve [ispat et-tigim] şoför 
I-1ssleep-NEG-COMP-ACC defend-REL.1s and proof do-REL.1s driver 
"The driver who I claimed and proved that s/he did not sleep.’ 


b. *Ben-im uyu-ma-digi-ni | savun-dugum| ve [ikna ol-duğum| şoför 
persuade be-REL.1s 
C; savun -dug-um 
S\NPagr\S'acc (NP/NP)\NP’\(S\NP\NP) 
(0) 
(NP /NP)\NP’\(S'aco\NP) 
The crucial step that distinguishes (5a—b) is shown in (5c). It is the backward 
variety of (3). The verb ikna 'persuade' requires a dative-marked nominal- 
ized clause therefore it cannot yield a like-category with savundugum, which 
needs an accusative-marked complement clause. This information is trans- 
parently projected by O. 


(6) Y\W: g X\(Y\Z): f — X\(W\Z): Ah. f (Ax.g(hx)) QO) 
Example (5c) might appear to suggest that the derivation can be lexicalized 


because a phonological word is syntactically derived, but the coordination 
data such as (5b) and (7) show that what takes place is indeed syntax: 


(T) Ben-im dava-sı-nı [bil-ip savun |-dugum adam 
Lis law suit-POSS.3s- ACC know-CONV defend-REL.1s man 
"The man whose lawsuit I knew and which I defended.’ 


The extra categories which allow us to lexicalize the O semantics in these 
examples are not well motivated in English or Turkish. Take for exam- 
ple the category S/(VP/NP)/(S/NP) for what, which Hoyt and Baldridge 
(2008) rightfully consider doubtful, in addition to its well-motivated category 
S/(S/ NP). The last category is empirically sound, as shown in (8a—b), but the 
extra category is not always sound (cf. 8c-d). Thus attempts to keep such data 
under the BTS syntax by lexicalizing the O are not very convincing. 
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(8) a. What did John hit? 
S/(S/NP) | SNP 


app 


b. What you can and what you must not do 
S/(S/NP) S/VP VP/NP 


M CNN, 
S/(VP/NP) 
C. What did John hit? 


S/(VP/NP)/(S/NP) | S/NP 


?? S/(VP/NP) 
d. What you can and what you must not do 
S/(VP/NP)/(S/NP) S/NP VP/NP 
S/(VP/NP) ` 

We know that O does not satisfy Szabolcsi's formal criterion for free opera- 
tion in a BTS system, because O = C(B?)B, and C = B(T(BBT))(BBT). 
We also know that adding Oto syntax would not change the automata- 
theoretic results because of the possible formulation of Oby Band T as 
above; Vijay-Shanker and Weir's (1994) argument for linear-indexed behav- 
ior of CCG makes crucial use of these combinators, and only these combina- 
tors. 

In summary, lexicalizing the O because of these concerns poses an empir- 
ical problem to a CCG lexicon, and ignoring the O-constituents would mean 
a loss of empirical coverage in syntax. 

The binary O is not redundant in a system of binary B, binary S and the 
finite powers of B. Let us look at the formulation of O without C to see this 
result. O = (B(T(BBT))(BBT))(B2)B. Although binary Bis at work in 
this definition, it also needs unary T, unary B and unary B?, to yield the O- 
semantics for adjacent substrings ou and €». Thus the O-constituents need 
the binary O because some of these combinators are not freely operating. 

The BTSO system which emerges from these considerations is listed in 
Table 2. I suggest the name orifice for O to symbolize its ‘leaking lambda’ 
inside the dependencies. All possible directional-modal alternatives of com- 
binators are listed for completeness. Only small powers are presented to save 
space. Since any lexicon is bounded by a maximum number of arguments, 
say n, we can take the required power to be m-1 where m is the maximum of 
such n among possible languages, which is by definition some number, rather 
than a variable. Steedman (2000b) suggests n=4 for English. 


app 


Table 2. The syntacticized BTSO system. 
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Application ` X/Y Y 
Y X\ Y 
Composition X/Y FILZ 
AZ AY 
X[Y Y\,Z 
Y/Z XV Y 
XLY (YLZ)W 
ON ZIW. XY 
XLY (Y\,,Z)|W 
(YLZ)W X\Y 
Type Raising A 
A 


Substitution | (X/Y)Z KS 
Y\,Z (X\Y)\Z 
(XLYNZ WZ 
YA (XN. Y)/Z 
(IZ | (YLW)|Z 
QGNW)Z ` (XNSY)JIZ 
(X,Y|Z | (YN.W)|Z 
(Y W)|Z (XN. Y)|Z 

Orifice X/(Y|Z) YLW 
Y\ W X\,(Y|Z) 
ZOE) YW 


pk d up aot Sede dup up ud qo up le ale E pue 


Y/W X\,(Y|Z) 
Legend: > forward 
< backward 


>È, forward crossing X 
«X, backward crossing 2 


A argument types of 
class of values T 
T value types of 


class of arguments A 


X 

X 

DE 
X\Z 
XZ 
Kee 
(XLZ)W 
(X\Z)|W 
(X\,Z)|W 
(X/.Z)|W 
T/i(T\iA) 
T\(T/;A) 
XZ 

3 2 
X\,Z 
X/Z 
(XLW)|Z 
Gëf 
OW) 
(X, W)|Z 
XL(WIZ) 
XNWIZ) 
X\ (WIZ) 
X/(W|Z) 


Modalities: 


>B? 


2 
>B? 
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It is a prediction of CCG that all these rules can be potential mergers 
in some language. They are not different in kind because they arise from 
currying and the adjacency of combinators, but they all manifest a different 
kind of syntacticized semantic dependency, including directionality. Thus the 
explanation offered by CCG is that syntax can be a reflex-like process be- 
cause nothing needs to be remembered in the construction of constituency or 
interpretation—i.e. in parsing—when all the possible dependency projections 
are factored into the universal rules. Thus every word and phrase projects its 
syntax and semantics onto surface constituents, and they do not fall prey to 
some grammar-external constraint when taking part in syntax. 

We have seen the harmonic composition rules and some substitution rules 
at work. Below I exemplify the crucial involvement of most of the remaining 
possibilities listed in Table 2.9" 


(9) a. Den Hund den ich fütterte German 

the dog thatI fed 
>B,: ich] /(S\NP) [fütterte e wp) NP 

b. John noticed suddenly the man with the big black briefcase. 
«Bx: [noticed] yp / yp [suddenly ] yp\ yp 

c. I offered, and may give, a flower to a policeman. 
> B^: [may es wp, jp [giveliyp;pp)/Np 

d. Adam dilenci-ye sadaka, kadın çocuğ-a mendil ver-di usul-ca. 


man beggar-DAT alms woman child-DAT napkin gave gently 
"The man gently gave alms to the beggar, and the woman a napkin 


to the child.’ Turkish 
«B [verdi] sv wp. om NPaa Pace [usulca Jes WP rom)|(S\NPnom) 
e. Adam dilenci-ye sadaka, kadin cocug-a mendil usul-ca ver-di. 
25: . 
> Bx: [usuleals apen Jes NPr) [Verdi ls Nn Pas WPacc 


f. <B2: [showed] S\NP)/NP/NP [gently Jy vP)\(S\NP) 
g. Welke boeken heb je zonder te lezen weggezet? Dutch 
which books have you without reading away-put 
>S,: [zonder te lezen (yp /yp)\NP [weggezet] yp\ yp 
h. He is the man I will persuade every friend of to vote for. 
>S: [persuade every friend of](vP/VP)/NP [to vote for] yp /NP 
i. Welche Artikel hast du abgelegt ohne zu lesen? German 
which article have you away-put without reading 


«S: [abgelegt] ypi yp [ohne zu lesen] (yp\ yp) NP 
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j. What book did you lend without reading and send without under- 
standing to Harry? 


«S [lend] up /pp) /Np [without reading up, yp) /NP 
k. Kitab-ı ` Ahmet’e dergi-yi ` Auge ve oku-ma-dan ` ver-di-m 

book-ACC A-DAT mag.-ACC A-ACC read-NEG-ABL give-PERF-1S 

‘I gave without reading the book to Ahmet and the magazine to 

Ayşe? Turkish 

>S%: [okumadan] up | Vp) wp... [verdim] up, wp. NNPace 
The reader can consult Steedman (1996b, 2000b, 2011), Steedman and 
Baldridge (2011), Baldridge (2002), Hoffman (1995), Hoyt and Baldridge 
(2008), Szabolcsi (1992), Jacobson (1990, 1999), Prevost (1995), Komagata 
(1999), Trechsel (2000), Bozsahin (1998, 2002) and the references cited in 
these works for a comprehensive list of syntactic constructions studied in 
detail from this perspective, including, gapping, coordination, relativization, 
cross-conjunct extraction, control, raising, passives, binding, scope, heavy 
NP and dative shift, nesting and crossing dependencies, word order and 
its variation, intonation structure, information structure and word structure. 
Grammatical organizations that affect a subclass of lexicons en masse, such 
as accusativity, ergativity and their interaction with subject-, agent- and topic- 
prominence are upcoming work. McConville (2006), Steedman (2006) pro- 
vide typological perspectives to CCG. 

The discussion in this section gives us a semantically motivated formal 
base, which we can take to be language invariant. It is the only resource that 
can constrain a free closure of the lexicon in deriving surface strings, to give 
us a landscape of possible languages. Possible lexical categories are limited 
too, as we have seen in Chapter 6 and Chapter 7. 

The choices adopted in the remainder of the book among the possible 
CCG options are as follows. We will assume them in the subsequent chapters 
where within-school differences are less important than different perspectives 
on syntax-semantics. 


(i) A freely generating binary BTS system, which makes no reference to 
substantive categories. 


(ii) No freely generating unary rule. Unary rules are lexical rules—after all 
they do not combine, and they are part of radically lexicalized gram- 
mars, hence by definition they must refer to substantive categories. 
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(iii) 


v) 


(v) 


(vi) 


(vii) 


A proposal to include the binary O in the system, due to its effects on 
constituency. 


No wrap. Therefore, a strictly combinatory system arising from adja- 
cency. Recall that Cis not surface wrap; it is a combinator and it is 
lexicalized. 


A linguistic representation of the predicate-argument dependency 
structures, the PADS, as the key locus of deciding on the lexicalized 
syntactic types. The constructive work of this choice will be more evi- 
dent in the next chapter. 


The book does not cover matters related to binding and quantifier 
scope, therefore it can say nothing about LF as a level. Three main 
proposals are discussed in some detail (Chapter 6). The analyses in the 
next chapter makes no use of LF as another rule system, or appeal to a 
system of constraints on binding. 


No conditions on derivations. All conditions are generalizations and 
constraints over the syntactic types in the lexicon. 


Only the basic categories and the slash bear features of relevance to 
syntax, i.e. morphosyntactic features. Thus only these features are visi- 
ble to syntax. In effect, this is equivalent to saying that unification does 
no linguistic work, except to simply match the categories in rule appli- 
cation by term unification (see Pareschi and Steedman 1987 for some 
discussion). This is in accordance with the agenda of seeing the limits 
of order doing all the work in syntax and semantics. 


Chapter 9 
The semantic radar 


A syntactocentric view of the landscape of syntactic constructions suggest 
that they fall into classes because their syntactic differences are empirically 
discernible. Bounded constructions such as passive, reflexive and control are 
clause-bounded, whereas constructions such as relativization and topicaliza- 
tion are not (and why the clause?). CCG's syntacticization of the combinators 
as the driving force of the computation of semantic dependencies might sug- 
gest that it is likewise syntactocentric in their explanation. 

This chapter attempts to show that this assumption would be wrong. The 
reason has already been implicated in the radical lexicalization of Bach's 
rule-to-rule hypothesis, so that codetermination of syntactic types and seman- 
tic types is the key to understanding why constructions manifest themselves 
the way they do. From this perspective, (un)boundedness must be explained, 
rather than assumed as some kind of syntactic taxonomy, sometimes with hy- 
pergrammatical syntactic principles doing the explaining for their syntactic 
distribution (e.g., subjacency, the a-over-a principle, different kinds of traces 
and their governance, exceptions to syntactic projection of expletives, chains, 
phases, differential linking between the argument structure and dependency 
structure, etc.). From the perspective of order-caused combinatory syntax 
and semantics, the explanation lies in the syntax-semantics interaction, and 
for that we need to see how semantics can shape the syntactic types. The 
same conclusion seems inescapable for understanding language acquisition 
and "competence" in competence grammars. 

This chapter surveys several domains that force us to bring semantics into 
play in the explanations. Just how much we must readjust our semantic radar 
in the grammar might sound like a grandma's recipe for cooking: not too 
much, not too little. I elaborate in the chapter in more detail. We cannot go 
as far deep as concepts, and suggest that semantics completely determines 
syntax, or that syntax could work with semantic types. Nor can we stay with 
what little information the syntactic types can provide us in lieu of semantics, 
and suggest that syntax completely determines semantics, or do semantics 
with syntactic objects. 

In all the cases we are going to cover, the semantics that must take part in 
the process are the individual's hypotheses about meanings, i.e. the predicate- 
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argument structures and dependency structures which must arise from (or 
feed into) grammars. The construal of these meanings, either by individ- 
ual experience or by social construction as suggested by Halliday (1978), 
is the real thing, the experience itself, not a hypothesis. The manifestation of 
PADS objects in the hypotheses, such as the Le, el type for pronouns, or (e,t), 
((e,t),t), compliment the picture by pinning down their model-theoretic in- 
terpretation, but the crucial involvement of the lexical predicate-argument 
structures will be the decisive factor for syntactic types. 


1. Boundedness and unboundedness 


There seems to be two ways that lexicalized predicate-argument structures 
(e.g. verbs) can manifest themselves in syntax, assuming that we are confin- 
ing ourselves to participant-taking elements, i.e. words with a thematic struc- 
ture: (i) heed a local argument, or (ii) heed an argument of an argument. From 
the view of order-instigated semantics, there seems to be no other option. 
The first option leads to a theory of voice. Our purpose here is to un- 
derstand why it is clause-bounded. Their differing possibilities, for example, 
why the passive targets objects, the reflexive reduces arguments on them- 
selves sparing the subject, and the reciprocal correlates them in the manner 
of the reflexive, are of course part of the explanation. Steedman’s LF, Jacob- 
son's type-shifting rules, and Szabolcsi's constitutive principles of grammar 
mentioned in Chapter 6 are combinatory attempts at an explanation. Here I 
will concentrate on (un)boundedness, and use the passive as the first example. 


1.1]. The passive 


Itis well-known that the passive cannot cross clause boundaries. (1b) attempts 
to passivize the embedded predicate of (1a), where the promoted object is not 
local. (1c) is an attempt to passivize the matrix predicate while promoting 
the embedded object to subject. (1d) passivizes the matrix predicate where 
the embedded subject is demoted to a by-phrase. This is not a passivization 
of (1a). 


(1) a. His closest friend claimed that Kafka loved chemistry. 
b. *Chemistry claimed that was loved by Kafka. 
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c. *Chemistry was claimed by his closest friend that Kafka loved. 
d. *That Kafka liked chemistry was claimed by Kafka. 


A purported “long-distance passive" would be misleading, because it would 
in fact be clause-bounded passivization followed by some other syntactic pro- 
cess. In (2a—b), the process is topicalization by fronting from the embedded 
clause in brackets. It is not the Turkish equivalent of (1b). It is grammat- 
ical because, unlike English, Turkish is a pro-drop language and it allows 
scrambling to the topic or the postmatrix-verb position from any level of em- 
bedding. Example (2c) would be the true long-distance passive where the 
matrix verb of (2a) is passivized but the matrix subject reduces the embedded 
predicate. 


(2) a. Wittgenstein | Kafka’nin kimya-yi sev-digini | 

W K-3s C-ACC like-COMP 

bilmiyor-du. 

not know-PERF 

“Wittgenstein did not know that Kafka liked chemistry Turkish 

b. Kimya-nin, Wittgenstein, | Kafka tarafindan sev-il-digini ] 

C-3s W K by-3s  like-PASS-COMP 

bilmiyor-du. 

not know-PERF 

“Wittgenstein did not know that chemistry was loved by Kafka.’ 

c. *|Kimya-nin Wittgenstein tarafindan sev-digini | 
C-3s W by-3s like-COMP 

bil-in-miyor. 

not know-PASS-PERF 
We have yet to see examples such as (1b—d) and (2c) to work in any language. 
Why is that? It is one thing to say that passive is clause-bounded, and build an 
entire model of syntactic computation with that understanding of domain of 
locality, and another to explain why it is so. I will sketch an analysis to exem- 
plify the order-induced view of the syntax and semantics of the construction. 

The simplest description of a morphologically-marked passive is that a 

syntactically and semantically transitive verb becomes syntactically intransi- 
tive, where the arity reduction causes the participant-type object to show the 
morphological signs of a subject (Payne 1997). This will do for our purposes, 
which is not to give a full account of the passive but to explain its clause- 
boundedness. 
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Passive is not a universal phenomenon. Washo lacks a passive; see Ja- 
cobsen (1979). When attested, it is always lexically headed by a bundle of 
features which we can call the passive morpheme. Consequently, no one 
expects a universal passivizer anymore (say a transformation; cf. Chomsky 
1957, Bresnan 1978, Bach 1980). This leaves lexical categories to do the ex- 
plaining for clause-boundedness across languages. 

Since passive is voice (it needs participants), it operates on verbal cate- 
gories in any language, not just on predicational categories. We need some- 
thing of the type S$|NP as a domain, rather than NP$ or S$. The notation uses 
the dollar convention of Steedman. 

The category schema of the passive, S$|NP, can be verified in languages 
where nonverbal predication is possible, including finite (tensed) matrix 
clauses. Voice is not possible in such cases (hasta can be NP/NP but not 
S\NP): 


(3) a. Annem hasta. Turkish 
mother.POSS. 1s ill 
‘My mother is ill." 
b. *Annem hasta-n-di. 
ill-PASS-PERF 
for ‘My mother has been taken ill." 


Itinvolves an arity reduction of one argument, where the result type must have 
at least one argument left to show subject properties, because every tensed 
clause must be fully interpretable. We can revise our domain to involve two 
or more participants, i.e. S|NP$;|NP, and range one less, i.e. S|NP$;, where 
the common index on the dollar sign means the same member of the lexical 
generalization is assumed. 

For simplicity I am assuming that the type NP can be made a participant- 
type phrase in a language. The important distinction we use here between the 
arguments, the participants and the properties does not necessarily need ex- 
tra degrees of freedom in a type-dependent radically lexicalized theory as it 
does in for example Construction Grammar. Participance can be achieved in 
a type-dependent grammar by type-raising all the NP arguments that are onto 
S. It suffices for our purposes to note that NP/NP would not be a participant- 
type but NP can be when it is type-raised. For example, Ax.man'x denotes a 
property; the variable x does not have a syntactic correspondent. Ax.sleep'x, 
however, denotes a predicate because on the syntactic side it corresponds to 
an S\NP, therefore its x is a participant. When we type-raise an a’ to AP.Pa' 


Boundedness and unboundedness 125 


we can see the narrowing of roles by the lexical syntax-semantics correspon- 
dence: if P corresponds to a syntactic argument-taking object such as a verb 
with S|NP$ type for some ‘|’, then a’ is a participant. If not, then it can be 
something else, perhaps a property. (In other words, participance and argu- 
menthood arise from lexical distinctions rather than some primitives.) One 
way to impose the participant versus property constraint in a computationally 
conservative way is to say that NP/NP is not an argument type that is suitable 
for type-raising. 

We can begin to radically lexicalize the skeletal category of the passive, 
(S|NP$;)|(S|NP$;|NP), to encode that subject and object are the participatory 
roles involved. (We cannot assume from this category that it is always the out- 
ermost '|NP' of the domain which is the object. In Welsh, a VSO language, 
that argument is the subject.) Following Steedman and Baldridge (2011), we 
get the category for the passive morpheme -en in English. I will assume coin- 
dexed slashes for the present discussion without notational clutter. 


(4) pass’: -en := (Sen\NP$)\ ¢((S\NP)3$/NP): APAXn +++ Axa.Pxg::x2one! 
where x; ---x2one’ is pointwise match of arity in (S\NP)$/NP. 


The PADS Px,- zone fully characterizes the active verbis argument struc- 
ture with the terms x,,...,x2,one’. P can be AxAy.adore'xy, but not for ex- 
ample A y.adore'kafka'y. This follows from the fact that -en applies to lexical 
items only (the "e constraint, equivalently, LEX). Examples of applying -en 
are: 


(5) a. written := Sen NNP : Ax.write'xone 
b. given := (Sa, NNP)/NP : AxAy.give'yxone' 


where one’ is a nonpro-term, symbolizing syntactic but not semantic arity 
reduction. Because of type correspondence in the syntax-semantics pairing, 
one’ can only correspond to the least oblique (maximally LF-commanding) 
argument of P, because it applies last. 

This PADS and the LEX constraint are not idiosyncrasies of languages 
like English and Turkish, where the passive morphologically attaches to the 
verb. It is not a question of morphology but grammar. A periphrastic passive 
would have a LEX constraint too, to have access to the thematic structure of 
the passivized predicate. (We shall see in $4 that there are limited other ways 
to conspire for the lexical constraint to ensure access to relevant parts of the 
thematic structure, namely the so-called external argument such as in Jaeggli 
1986.) 
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Consider the Welsh cael passive as a case in point. For brevity, and in 
relevance to one,’ I will only consider the short passive, where the by-phrase 
is not present. 


(6) a. Cafodd Wyn ei rybuddio. 
Got.3s Wyn his warning 
“Wyn was warned.’ Welsh; Awbery (1976: 210) 


I repeat Awbery’s description of the passive, which I used earlier to sug- 
gest that a pronoun might be required by syntax: “The passive sentence has 
a sentence-initial inflected form of cael (get) of the same tense and aspect as 
the verb of the active. This is followed by a noun phrase identical to the object 
of the active. Then comes a pronoun of the same person, number and gender 
(if it is 3sg) as this noun phrase, and an uninflected form of the verb in the 
active" Awbery (1976: 47). The pronoun and cael are obligatory; Awbery’s 
data shows that what is dropped if the noun phrase after cael is a pronoun is 
the subject NP, not the possessive pronoun required by the passive: 


(7) Cawsom (ni) ein rhybuddio gan y ferch. 
Got.1pl (we) our warning by the girl 

“We were warned by the girl.’ Awbery (1976: 48) 

Cael takes part in constructions not involving the passive, for example Cafodd 


Emyr lyfr (Got Emyr a book). Awbery assumes that this is the same cael, 
which I will follow.68 


(8) Cafodd Emyr lyfr 
got.3s E abook 


Sen/NP/NP3, NP | NP 
Sa/NP — 
Sen 
It suggests that the possessive pronoun and cael conspire for a passive read- 
ing (9). 


(9) Cafodd Wyn ein rhybuddio 
got.3s W his warning 


Se/ NP/NPs, NP S\(Sen/NP)((S/NP/NP3s) S/NP/NP 
AxAy.get'x w' :APAQ.(Pone )(Qone) AxAy.warn'yx 


Sen/NP — S\(Sen/NP) 
: Au get ol : AQAy.warn'yone' (Qone’) 


> 


S : warn’ (get'one'w' one’ 
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Notice that, for Welsh, the argument order in the lexical specification of 
P for ‘ein’ is VSO: AxAy.warn'yx. Note also the +LEX constraint on the 
syntactic type of P although it is not morphologically attached.9? 

From the restriction that the passive applies to lexical verbs, because it 
requires access to participants therefore to thematic structure, it follows that 
the substitution environment which one' faces is always of the form (10a) 
for a passivizable predicate pred,’ not (10b), which would be the semantic 
reflex of (1b-d), because e.g. (10b) would not be an arity reduction of pred’ 
in P but of some x;. Notice the same (10a) structure of P for English, after 
-en seing the thematic structure and doing all but one last reduction, repeated 
here as (10c). 


(10) a. (Axi.pred'x, - --x2x1) one’ 
ALIM LEE 


P 
b. Ax, .pred'xy - - - Lronell- -x1 
c. pass (-en):-(SesNNPS)N((SNNP)S/NP): APAXn + -Ax2.PXn + -xoone' 


One’ as a PADS object could not substitute inside the x,,...x2 or xj, even if 
pred' were a complement-taking verb such as claim, where the complement 
clause has its own lambda abstractions, for example A.x/ y.love'xy in (1). 

That is why the passive is bounded. The thematic structure of an argu- 
ment is opaque to a predicate. Inner lambdas are opaque to claim or any 
complement-taking predicate, therefore nonsubstitutable. This result trans- 
lates directly to the syntactic types involved." The construction arises from 
the interaction of its constraint with the one-at-a-time substitution in syntax 
and semantics. This property is not a fortunate convenience of lambda calcu- 
lus; any syntax-semantics connection based on order alone ought to negotiate 
a similar correspondence."! 

The universal semantics of the passive (that it needs predications of partic- 
ipatory sort, e.g. verbs) explains why it is clause-bounded: the types of NPs 
involved must be functions from participatory types onto S, i.e. type-raised 
NPs, to be able to distinguish participatory vs. nonparticipatory events. The 
Turkish distinction S\NP versus NP/NP arises from this aspect (3), where 
the type NP/NP is not type-raised. 

Therefore, the syntactic boundedness of the passive follows from its se- 
mantic dependencies and their syntactic reflection: it applies to lexical verbs. 
However, the LEX constraint involved in this model is a one-way implica- 
tion. For example, the passive and the reflexive are bounded, and they both 
arise from the LEX constraint (Steedman and Baldridge 2011). But bound- 
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edness does not necessarily imply the LEX constraint. Take control, which is 
bounded, as shown in (11a), but without the LEX constraint (11b). 


(11) a. I can persuade Mary; to persuade the wine taster; to _j/si try 
whisky. 
b. I want to (seriously challenge) _¢ (the LEX constraint). 


Radical lexicalization predicts that the LEX constraint cannot be the whole 
story about boundedness, because some limited degrees of freedom still exist 
to conspire for boundedness, which are made available when semantics is 
considered as part of the hypothesis space. Upcoming work attempts to work 
out the typology of control from a radically lexicalist perspective. 


12. The relative 


Unbounded dependencies follow from similar semantic considerations. Con- 
sider relativization, (12). 


(12) The field which I can safely claim that Kafka could convince Wittgen- 
stein that Russell might like 


The kind of PADS that we see in such dependencies seems not to arise 
from the predicate-argument structure of a predicate, but from the predicate- 
argument structure of the arguments of a predicate. Naturally, we expect the 
syntactic types to reflect the difference faithfully. 

For example, in reflexivization and passivization, where, given a predi- 
cate, say AxAy.pred'xy, they would reduce or equate x or y argument of the 
predicate pred,’ hence they can be sensitive to its thematic roles. Unbounded 
dependencies seem to leave it to the arguments x and y: 


(13) a. Adam-in oku-du$unu ` san-dig-im kitap Turkish 

man-3s read-COMP.3s think-REL-1s book 
“The book which I think the man read’ 

b. Kitab-i oku-dufunu | san-dig-im adam 
book-ACC read-COMP.3s think-REL-1s man 
‘The man who I think read the book’ 

c. Sen-in kitab-1 oku-dugunu ` bil-digini san-dig-im 
You-28S book-ACC read-COMP.3S know-COMP.3S think-REL-1S 
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adam 
man 
"Ihe man who I thought you knew read the book’ 


The reason I switched to the verb-peripheral language Turkish is to show 
that when word order constraints are not there, the semantics of these depen- 
dencies seem to know no limits as far as the thematic structure of the embed- 
ded verb is concerned. Note also that, to the verb san above, the argument 
structure of oku is opaque. 

The reason that examples such as (13b) can be ungrammatical in a verb- 
medial language like English—see (14)—is not the unavailability of this se- 
mantics because of the opaqueness of thematic roles, but the word order of 
the language acting as a further constraint on this construction. 


(14) *The philosopher who I can safely claim that Kafka could convince 
Wittgenstein that would change the world 


All verb-medial and verb-peripheral languages show this asymmetry, barring 
of course idiosyncratic restrictions (e.g. Inuit only allows ergative NPs to be 
extracted, although it is verb-peripheral).’ 

The path to unboundedness follows the arguments-of-the-arguments track, 
limited only by external factors such as agreement in Latin relative pronouns, 
and word order constraints. It is thus a conspiracy of semantics and syntax, 
and all that we need to capture this aspect is a type-dependent conception of 
a category. Unlike the semantics in (10) where one' cannot be associated with 
any x; because it needs access to the thematic roles of pred,’ these dependen- 
cies must be blind to thematic roles, and the only way they can do this is to 
associate it necessarily with an x;. We get the following semantics of relative 
pronouns as a result of that, which seems cross-linguistically generalizable: 


(15) relpro' = APAQ.(Ax)and' (Px) (Qx) 


Notice that x is not a syntactic variable, and it is not an argument of a predi- 
cate whose thematic structure is transparently visible; P and Q are opaque to 
relpro.' 

It follows then that the reason why relativization is an unbounded depen- 
dency is because P and Q can have their own syntactic lambdas as well so 
that x can be passed down to them indefinitely. That would in turn require the 
argument-taking arguments of P, i.e. think-, say-, claim-, tell-like verbs. For 
example, here is the unfolding of the PADS for the bracketed fragment of the 
string the philosopher |who I claimed that Wittgenstein adored |: 
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(16) who I claimed that Wittgenstein adored := 


APAQ.(3x)and'(Px)(Qx) (claim (Az.adore'z witt')i) =g 
AQ.(3x)and'(claim'(Az.adore'z witt')i x) (Ox) =p 
AQ.(Ax)and' (claim (adore x witt')i')(Qx) 


Radically lexicalizing the semantics of this kind spells the following cat- 
egories for English. Assuming similarly semantically inspired categories for 
claim-like verbs, the transparent syntacticization of the combinators simply 
reflects these dependencies on syntax. The crucial steps are shown in (17c). 


(17) a. that :- (NNN)/(S|NP) : APA Q.(3x)and'(Px)(Qx) 
b. whom := (NNN)/(S/NP) ` APA Q.(3x)and'(Px)(Qx) 
c. the philosopher 


fame? 


whom I claimed that Wittgenstein adored 
(NNN)/(S/NP) S/(SNNP) (SNNP)/S' S'/S. S/(S\NP3s) E 
S/NP 
S']NP e 
(S\NP)/NP d 
S/NP P 
(NN) í 


It would be inconsistent to say that claim is capable of doing (16) above 
and has the type (S\NP)/NP, rather than (S\NP)/S'. The lambda argument 
of a ‘/NP’ would not be a syntactic lambda (it might be a property, such as 
À x.man!x, with a semantic lambda), whereas the semantic counterpart of an S’ 
would be expected to have thematic structure. This is captured in the syntac- 
ticized B without extra assumption; it is not possible to get the Bclaim'adore' 
effect of the third line of (17c) syntactically from (S\NP)/NP and S'/NP; we 
need (SNNP)/S' and S'/NP. 

It is important to reiterate the universal claim of the type-dependent rad- 
ical lexicalization about the syntactic processes. It does not claim that the 
passive is universally bounded and the relative is universally unbounded. It 
suggests that these behaviors always arise from the transparent projection 
of rule-to-rule assumptions of a language in its lexicon. Any behavior that 
seems universal is a manifestation of the self-organizing constraint that a nat- 
ural grammar would have limited degrees of freedom if it is combinatory, 
type-dependent and radically lexicalized. 
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If all languages do something about voice, it is because it seems to arise 
from the need to have lexical access to thematic structure, which we showed 
as the LEX constraint. If a class of lexical items specify lexical access to 
thematic structure, then by definition the thematic structure's opaque parts are 
not relevant to them, which might give rise to bounded behavior. If a class of 
predicates allow complements, e.g. say-that, think-that, etc. then unbounded 
behavior is possible but not necessary. 

This way of thinking predicts that when a phrase is only apparently a 
complement but not a syntactic clause, we cannot expect unbounded behav- 
ior. Such morphological ambiguity might arise in morphologically rich lan- 
guages. Consider (18a), which morphologically seems to include a subordi- 
nate clause (18b has the same phonology for the subordinate verb but different 
semantics; I disambiguated the examples in morphological glosses). As the 
semantics of relativization from such clauses show in (18c-d) respectively, 
the first one does not arise from complement semantics; house cannot be an 
argument of the embedded show in (18c), precisely because it is not a sub- 
ordinate clause but a headless relative, i.e. an NP with no thematic structure 
(equivalently: it has no lexically-specified syntactic lambda). 


(18) a. Ahmet Ayse’nin ev-i goster-dig-i-ni vur-mus. 
A A-3s  house-ACC show-REL.3s-ACC shot 
‘Ahmet shot the one to whom Ayse showed the house.’ 

b. Ahmet Ayse’nin ev-i góster-di$-i-ni bil-iyor. 
Ahmet Ayse-3s house-ACC show-COMP.3s-ACC knows 
‘Ahmet knows that Ayse showed the house.’ 

c. Ahmet'in Ayse’nin góster-dig-i-ni vur-dug-u ev 
Ahmet-3s Ayse-3s show-REL.3s-ACC shot-REL.3s house 
“The house at which Ahmet shot the one whom Ayse showed’ 

d. Ahmet'in  Ayse nin géster-dig-i-ni bil-dig-i ev 
AHmet-3s Ayse-3s show-COMP33s-ACC know-REL.3s house 
“The house which Ahmet knows Ayse showed’ 


In summary, if we get the semantics of a construction right, which is to 
decide whether the thematic structure (local lambdas) or the opaque structure 
(inner lambdas) is responsible for its dependency, and typologize the syntac- 
tic aspects of the words accordingly, as PCTT and the rule-to-rule hypothesis 
suggest, "^ then we get the facts of boundedness and unboundedness in syntax 
as the corollaries of a purely adjacency-based system. 
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Only additional constraints on lexicalized syntactic types can stop the se- 
mantics of the construction from manifesting itself in a language, such as the 
word order of English eliminating (14), causing the that-t effect, Inuit’s erga- 
tive NP ban on relativization, or the Latin relative pronoun's strictness about 
the morphological case of the extracted element. 

Syntactocentric proposals such as subjacency, successive cyclicity (of GB) 
and slash passing (of GPSG) can be thought of as matters to help us pin down 
the syntactic side of (un)boundedness, but the phenomena and the differences 
between them do not need extra mechanisms for explanation other than a 
type-dependent conception of syntactic category based on adjacency, where 
the semantic side uses lambdas as a way of constructing associations with 
thematic roles. 


2. Recursive thoughts and recursive expressions 


Let us have a look at the appeal to extra mechanisms in grammars for the 
purpose of understanding other aspects of (un)bounded behavior. ? 

Daniel Everett (2005, 2009) has argued that the Amazon language Pirahá 
stands as a striking counterexample of not having recursion in its grammar be- 
cause, among other things, it lacks embedding of phrases. This and other gaps 
in Pirahá grammar and lexicon he attributes to the speakers' cultural choice 
of insisting on talking about the immediate experiences of interlocutors only. 
This property, according to Everett, weakens Chomsky's recent claims that 
syntactic recursion is a necessary human trait distinguishing the language 
faculty (see Hauser, Chomsky and Fitch 2002). 

The key concept in this argument appears to be syntactic embedding. 
Clearly, Everett could not be claiming that the Pirahá could not entertain re- 
cursive semantics as part of their thoughts, such as the semantics of I like 
you, I think I like you, I think you think I like you, You think I think you think I 
like you, etc., which we might call the immediate-think language of thought, 
because these can in principle be part of the immediate experience in his ac- 
count. 

A further test for this conclusion can be constructed. Bring for example 
an English-speaking 10-year-old, who might produce the sentences above, 
into an exclusively Pirahá-speaking culture. By Everett’s account and that of 
syntactocentrism, which both decide on recursion by the evidence of syntac- 
tic recursion, the recursivity of the underlying thoughts in these expressions 


Recursive thoughts and recursive expressions 133 


is indisputable. In the course of time the child might drop the English-style 
embedding syntax, and adopt the Piraha style—assuming Pirahá syntax is in- 
deed nonembedding as Everett claims, see the criticisms by Nevins, Pesetsky 
and Rodrigues (2009), Pullum and Scholz (2009). This would not change the 
conclusion that the child had recursive thoughts to begin with, as the syntactic 
criteria had been observed in the child before. 

A reciprocal experiment on hapless children would suggest the same con- 
clusion. Take a Pirahá child to England. Just because a Piraha born-and-bred 
child could utter syntactically recursive expressions after enough exposure to 
English in an exclusively English-speaking community does not necessarily 
mean the child has learned to think recursively in the new community. 

The uniquely human trait of recursion that Chomsky appears to refer to is 
syntactic recursion, attributed to narrow syntax in Hauser, Chomsky and Fitch 
(2002). The thought experiment provided above shows that no one would 
doubt the existence of recursive semantics for all humans. We can take it as 
common ground and look at its consequences. 

What exactly is semantic recursion? Surely the immediate-think language 
concocted above does not require Y think,’ which would require both semantic 
and syntactic recursion. Recall the formulation of Y using S and K in fn. 35, 
i.e. without syntactic recursion. The K is the crucial element in that defini- 
tion for the present discussion. As Craig proved in Curry and Feys (1958), 
K cannot be defined by the other combinators discussed so far. Thus we are 
either left with the syntactic Y to get Y effects, or face the empirically fatal 
K in syntax, to have syntactic recursion. No data seems to be forthcoming for 
either theoretical move. 

The knowledge of recursion of the kind the word think symbolizes simply 
suggests that people who can entertain thinK'-like thoughts have a knowledge 
of their language manifesting the understanding of AxAP.think'Px, where x 
is the thinker and P is the thinkee, which can be another thing of the same 
sort, i.e. something onto type t. This knowledge manifests itself in English as 
(SNNP)/S' : APAx.think' Px. We do not need syntactic recursion for that even 
if the category were (S\NP)/S. Syntactic recursion means a freely-operating 
Y in syntax or its functional equivalent, not an argument which is of the same 
kind as the result.’ 

Theories such as CCG serve to show that the potential infinity of human 
languages, in the sense of having no upper bound on sentence length or on the 
number of sentences, does not force us to assume a recursive syntax, as we 
have so far managed to live without Y and K. The language of a CCG gram- 
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mar is the closure of the YKl-less syntax of Table 2 on the lexical assump- 
tions that constitute the CCG grammar. We can assume this property because 
of radical lexicalization. Nothing moves and nothing is added or deleted by 
the universal rules. Thus Y or K cannot appear out of the blue to yield syn- 
tactic recursion, unless they are part of the knowledge of some words, i.e. 
embedded in a lexical category, for which we have seen no evidence so far. 

Thus it is assumed from the beginning that a language can be potentially 
infinite, not because of syntactic recursion but because of closure, that is, 
from free operation in syntax. Can we entertain the possibility of finite hu- 
man languages? Yes, by taking a finite closure of Table 2, up to a limit on 
sentence size, the number of applications of rules or whatever, on a list of 
lexical assumptions, and proving that the language in question never exceeds 
that limit. That seems to be extensionally doable, but barring the potential in- 
fringement of the future speakers' rights to break that limit, it is intensionally 
quite problematic. 

If we only stick to the number of sentences that have been spoken in a 
language up to a certain time, then any language is vast but finite. Call the 
set E, for example English spoken up to September 4, 2009, and a lexicalized 
grammar of E would be our theory of that English. 

Would that theory be useful in understanding the language manifested in 
E? Certainly. It can help us understand why, in the history of gathering the 
E-expressions, we have never encountered for example a sentence in which 
three arguments are extracted out of an embedded clause, or why arguments 
are coindexed indefinitely rather than predicates. We can also wonder why 
the finite-French set F which is locked and sealed at some time appears to 
have the same properties." 

We can also wonder why we never see in E the intonational phrasing 
(Three mathematicians in)(ten prefer corduroy), while we see an abundance 
of (All mathematicians prefer) (and some philosophers detest)(corduroy). 
This is the true nature of linguistic explanation, and it does not need the infin- 
ity assumption to be worthy of interest. It certainly would not need syntactic 
recursion either, for the presumed set E is finite. 

Thus Hauser, Chomsky and Fitch's (2002) claim that syntactic recur- 
sion is indispensable, and Everett's (2005) use of that result at face value— 
negatively—to conclude that grammars are constrained by cultural aspects 
and not by universal aspects, are unwarranted. Any grammar reflects a cul- 
tural aspect anyway if two or more people happen to agree that, for them, for 
example S\NP: Ax.sleep'x provides the same linguistic recipe of express- 
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ing sleep’-like thoughts in their language. Radical lexicalization predicts that 
these constraints have no place in universal syntax, and since there is no other 
locus for formulating these constraints (e.g. phases, spell-outs, cycles, other 
levels of grammar etc.), they must go in the lexicalized grammar of the lan- 
guage. This makes the cultural aspect of grammar a truism. 

We can identify the collective cause of constraints that shape the Piraha 
lexicalized grammar as the immediate experience, as Everett (2009) claims. 
Such a unique source would be of great interest to grammarians, as well as 
anthropologists and ethnolinguists. The prediction of CCG is that Piraha sur- 
face syntax is a closure of that identified grammar on Table 2, not a separate, 
parallel or parametric mechanism. 

In summary, it is not their purported infinity that makes human languages 
worthy of studying scientifically. It is the limited nature of syntactically man- 
ifesting the semantic dependencies. In other words, we seem to be facing a 
Humean problem in linguistics, not necessarily a Cartesian, Lockeian, or von 
Humboldtian problem. They have assumed tabula rasa or the other extreme, 
and infinity as creativity par excellence. The truth seems to lie somewhere in 
between. 

From a cognitive science perspective, we also seem to be facing an old- 
Platonic, late-Wittgensteinian and Husserlian problem. Knowledge of lan- 
guage can be constructed, as Plato asserted for all kinds of knowledge. But 
the construction is up for grabs, rather than drawn from a concept reposi- 
tory of the mind. We need the practice of hypothesizing rightly or wrongly 
about constructions, which requires the true Platonic skepticism toward such 
constructions after knowledge is constructed. 

False knowledge of words is knowledge if we think it is true by virtue 
of constructability, and as long as we are prepared to think otherwise when 
the states of affairs suggest otherwise, as Hume suggested. Recall that, due 
to radical lexicalization and the combinatory notion of category, knowledge 
of words is the knowledge of language. Any initial bias, such as that con- 
ceived as “universal grammar’, serves to narrow down the search space for 
the hypotheses about words. It seems to involve a Wittgensteinian play with 
nature to sort out enumerable meanings from experience, i.e. from personal 
history, and with kin to share subjective experiences, and with limited access 
to theories of other minds, as Husserl claimed. Moreover, we cannot assume 
that other species which are capable of handling some semantic dependencies 
are not able to cope with these things among themselves and with nature. The 
fact that they may not (be able to) communicate these to us is irrelevant.’® 
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If any computable semantic dependency were syntacticizable in language, 
to epitomize human creativity in the infinite capacity of language, we would 
already have a linguistic theory: the Turing machine, with a memory bounded 
by some factor depending on the size of the string of words. Somebody has 
to come forward with some data beyond near context-freeness to make this a 
forced move, rather than some stylistic or idealistic choice. 

A somewhat secondary but not unworthy objection to Everett's (2009) 
claim that Pirahá falsifies Chomsky's conjecture (that recursion is essential) 
follows from formal language theory. Hauser, Chomsky and Fitch's (2002) 
argument in general and Chomsky's early writings in particular (when he had 
considered the generative capacity of formal grammars a research agenda 
for linguistics, for example Chomsky and Miller 1963) argue from a class 
of languages. In a class which is considered adequate for natural languages, 
there must be enough automata-theoretic power to do recursion and context- 
free dependencies, whether they are attested in every member or not. That is 
why we try to identify a class of languages with a characteristic automaton. 
It does not follow that all languages in the same class are equally demanding, 
so that we might seek recursion in all of them because we have seen it in one 
(which Everett appears to think Chomsky argued for, which he did not). Take 
a?" and Je? ). Both are in the same class (of recursive languages). 

This point is secondary because the main impetus of the objection is that 
Hauser, Chomsky and Fitch's (2002) argument about the necessity of syn- 
tactic recursion in fact shows the necessity of semantic recursion, and the 
arguments about recursive semantics are quite strong. So are the facts that 
they may be expressed nonrecursively in syntax. Hixkaryana insists on the 
nonembedding manifestation of recursive thoughts, such as "He went to Ka- 
sawa, because has was wanting to talk with Kaywerye' or 'she was picking it 
and eating it' (Pullum and Scholz 2009). 

The combinators, and through them adjacency, show that having a syntac- 
tic type dominating a tree containing that type does not necessitate syntactic 
recursion. We need evidence for a Y K syntax or its functional equivalent. No 
word or constituent seems to involve these combinators. 

We must couch a combinatory system of this sort in a set of interfaces so 
that we can accommodate experiential differences, given the limited nature 
of syntacticized semantic dependencies. 
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Milena := S /(SWNPs, ): À f.fm | combinatory projection to 
constituents of string := syn:sem 
adore := (S\NP)/NP: AxAy.adore xy 
n 7 Ze Z serialization of feature geometry 
ed := Van pe A f.past f from string and syn 


Lexicon oe 
N normalization from syn and sem 
realization and intake 


Milena adored := Sg, /NP |: Ax.past (adore! xm) . ] 
l inference and valuation 


(PF) (NF) 
Phonological Form Normal Form 


Phonetic Form The model world 


Figure 6. An architecture for linguistic computation. 


3. Grammar, lexicon and the interfaces 


We need a mechanism to mediate sounds and meanings “out there", the types 
in the linguistic system, and multiple experiences. We must keep in mind 
that the kinds of meanings in question here are hypotheses about what strings 
mean. They are part of the individual’s grammar. They are not meanings of 
the sort that makes The rose saw Kafka, colorless green ideas sleep furiously 
or Captain Haddock is the president of the Society of Sober Sailors to be unac- 
ceptable or dubious. This point of clarification cannot be emphasized enough, 
as Chomsky does quite frequently, for example Chomsky (2000: 199:fn.18) 
as of lately. 

The standardly assumed inverted-Y diagram of linguistic architecture in 
Figure 6 serves as a good base for adjacency syntax, provided that we put 
semantics in the frame and out. I use italics outside the box to symbolize that 
what takes place inside is discretely represented, and what is outside is proba- 
bly not, e.g. sound and light waves, time- and space-varying images, objects, 
air pressure etc. The CCG architecture can be thought of as Figure 6 too. Any 
item in the lexicon that has a syntactic type can take part in the combinatory 
projection, which is handled by the invariant dependencies of Table 2 without 
any intermediaries. That is to say that the grammar is radically lexicalized.? 
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As implicated by the direct translation of the combinators’ semantic de- 
pendencies to their syntacticized counterparts, every constituent gets a syn- 
tactic type and an interpretation. The notion of constituency is likewise 
syntacto-semantic: anything that can be combined by syntacticized combi- 
nators is a constituent, including the traditional ones such as adored Kafka in 
Milena adored Kafka, and also Milena adored. Its constituent behavior is at- 
testable in syntax: Milena adored and I believe Wittgenstein might have liked 
Kafka. 

The constituent string which carries a syntactic type and an interpretation 
relates to the phonological form and semantics, which form the linguistic 
system's gateway to articulatory and intensional-conceptual interfaces. The 
normal form at the linguistic end of the interface is the PADS normalized on 
all kinds of conversion, where the applicative structure of the semantics is 
revealed. Both have perceptual correlates, speaking and for example world- 
and object-tracking. It is clear that in Figure 6 the mediator of the PF-NF 
relation is the syntactic type. 

The need for PF and NF to communicate with the interfaces to and fro 
arises from semantics as well. Steedman (20002) has shown that some con- 
stituencies in English are unaccounted for unless there is a way to commu- 
nicate intonational features into syntactic types, and through them to PADS. 
That is why normalization (and its reverse, abstraction) must heed both the 
syntactic type and semantics. The model world imagined by the speaker- 
hearer to which it is anchored outside the linguistic architecture needs no 
such linguistic mechanisms. We can safely assume that the referents of the 
PADS terms such as she are known to the speaker anyway in a purely ap- 
plicative form. For example, the referent of she was Kafka in the utterance 
Kafka wrote Milena many letters; she was adored, when uttered by me at 
noon February 1, 2010. These terms are abstractions only to the linguistic 
systems of the speaker and hearer, which means that the PADS is only one 
step away from a model-theoretic interpretation. 

It seems clear from Steedman's (20002) work that constituency and into- 
national phrasing coincide in languages where tunes are at liberty to do syn- 
tactic work. (This is not the case in tone languages.) The question is how to 
decide which is the determinant, and whether it arises from grammar. These 
issues relate to compositionality. First I note that maximal leftward bracketing 
allowed by constituency is afforded by CCG. It is not complete left bracket- 
ing because of the limited nature of the semantic dependencies, a constraint 
which seems to be the source of constituency in natural languages. 
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(19 a. | know that three mathematicians in ten prefer corduroy. 
S/(SNNP) (S\NP)/S' 
S/S' 
b. I know that three mathematicians in ten prefer corduroy. 
S/S’ S'/Stin 
—————>B 
S/Stin 
c. I know that three math. in ten prefer corduroy. 
S/Sin (S/(S\NP))/N N (NNN)/NP 
>B? 
(S/(S\NP))/N 
? N 2? 
d. I know that three mathematicians in ten ` prefer corduroy. 
(S/(S\NP))/N N (S\NP)/NP | NP 
S/(SWP) Í 
S/NP E 
S > 


CCG cannot make a nonconstituent interpretable, in semantics or in informa- 
tion structure, thus it makes the narrow claim that constituency is the deter- 
minant. 

The claim is empirically falsifiable. All legal bracketings are attestable. 
Take the kind of constituency exemplified in (19c). The prefix up to and in- 
cluding the word three can behave as a constituent: / know that every and you 
think that some geometers like Euclid.®° The impossible bracketings are the 
impossible constituents (parentheses show intonational phrasing): *(Three 
mathematicians in)(ten prefers corduroy), as shown in the latter part of (19c). 

Second I note Steedman’s (2000a) observation that, although tunes can 
lay over different kinds of syntactic constituents, and in different orders, they 
do the same thing to the phrases on which they are superimposed: 


(20) a. Well, what about MANNY? Who married HIM? Steedman 
(2000b: 98) 
Rheme Theme 


(ANNA) (married MANNY.) 
H* L L+H* LH% 
b. Well, what about ANNA? Who did SHE marry? 
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Theme Rheme 


ro —Ó— 
(ANNA married) (MANNY.) 
L+H* LH% H* LL% 


Pitch accents are designated by H (for high), L (for low) and their combi- 
nations. The tone associated with the stressed syllable is designated by suf- 
fixing a ‘*’ to the tone. Following the Pierrehumbert and Hirschberg (1990) 
model of English intonation, we can assume a prosodic organization of inter- 
mediate phrases (1) which are grouped into intonational phrases (@). Interme- 
diate phrase boundaries are designated by L and H, which are distinguished 
from the intonational phrase boundary tones L% and H%. 

Their semantic contribution is crucial to interpretability. Pitch accents on 
words are reflected in their syntactic types and in their PADS, such as those 
for Anna and married above. This process can be assumed to take place 
presyntactically as suggested by Steedman (2000a), by a rule of associating 
autosegmental-metrical features with the acoustic correlates of the items in 
the surface string (or with visual correlates in sign languages). It engenders 
derivations such as those in Figure 7. 

Without this communication with phonology, we cannot assume that H*L 
is rheme-marking (p) and L+H* is theme-marking (0) in English. This 
knowledge has its right place in the PADS therefore it must be communicated 
to it, which can only be done by the syntactic types; see the **' designations 
in the derived PADS of strings above, which is used to represent some value 
of important information. The fact that these are lexical choices (Turkish has 
no L+H*, and L*H is the theme marker; see Ózge and Bozsahin 2010) forces 
us to assume that the compositional delivery of information structure ought 
to rely on the lexicalized syntactic types, that is, on a lexicalized grammar. 

The delivery of compositional meanings for such kind of constituents de- 
pends on the lexical category of the (intermediate) boundary tones. Without 
their semantics, i.e. theme- or rheme-marking as a side effect on the PADS, 
the communication from phonology about e.g. stress cannot penetrate the lin- 
guistic computation. Many grammatical constituents have been overlooked in 
linguistics due to this neglect, such as the following:*! 


(21) (PENCERE-YiI Ali), (kapi-yt) (MEHMET kir-di.) Turkish 
Window-ACC A door-ACC M break-PAST 
‘Ali broke the window, and Mehmet, the door.’ 


The example had been rejected on grounds of its claimed oddity in “null 
context’, but that is precisely the point of bringing in the external factors 
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Marcel PROVED L- H% 
L+H* 


S/(S\NP) (So\NPo)/NPo S$, \S$n (S$o\S$y)\(S$1\ S$) 
: Ap.pmarcel’ : Axdy.*prove'xy : Af.yn'f || : AfAg.|H](fg) 
>B 


So /NPo 
` Ax. x prove'xmarcel' 


: Af [H]( f) 
So / NP, 
: [H](0' (Ax. x prove'xmarcel')) 
COMPLETENESS  L- L% 
H* 
Sp\(Sp/NPp) ZE, ^ (S8p\S8y)\(S8\SBq) 
Àq.q *cmpness : Af.n' f : AfAg.[S]Cfg) 
: Af. [S](n f) 
So\(S@ /NP9) 
: [S](p'(A p.p * cmpness)) 
S 


W 
: [S|(p' (A p.p * empness")) ([H]|(0' (Ax. x prove'x marcel))) 


/ 
Sg: * prove’ x cmpness' marcel! 


CCG derivation of Marcel proved completeness, 
in response to What did Marcel prove? 
adapted from Steedman (2000a: exx.67-68) 


Figure 7. CCG and information structure. 


(22) a. Ben, kapı-yı ` ALI kir-di zanned-iyor-du-m. 


I  doorACCA  break-PAST think-IMPF-PAST-1s 
‘I thought Ali broke the door.’ 
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into the linguistic system in limited ways, to see the potential constituencies 
demanded by compositional semantics. The example is perfectly gram- 
matical, and the following contextualization proves it. Notice that it is not 
nonlinguistic recovery from the context or emphatic stress. Note also that 
the intonational phrases are delivered as semantically interpretable syntactic 
constituents, which are solely responsible for bringing out their information 
structure: 


142 The semantic radar 


b. Hayır, (PENCERE-YI L- Ali) H- 
H* L* 
No  window-ACC A 
2T >T 
Sp/ S$1\S$p (Se NP acc) / S$; S $o 
(Sp \NPp,acc) (Se\NPo acc \NP0 nom) 


S, /(SyNNP acc) (S1 NNP acc) / (51 ND: aer NNP aen) 
>B 


Si / (S1 NN sec WPi nom) 
(kapi-yi) H- (MEHMET kir-di.) L- L% 
L* H* 


door-ACC M break-PAST 
< >B% 
D St\NP acc 
(S1 NNP ace) 


CCG derivation of the constituents in (21): 

‘No, Ali broke the window, and Mehmet, the door.’ 
The example also shows that constituent structure, dependency structure, in- 
formation structure and functional structure can diverge in various ways, and 
the simplest way to bring them together is to have them communicate through 
the syntactic type, rather than devise separate mechanisms for each aspect. 
The first coordinand above is a nontraditional constituent. The new or im- 
portant information is spread over the string, and the functional roles of that 
information are not aligned (window is the object and Mehmet is the sub- 
ject). Such divergences might suggest multistratal syntax, constraint-ranking 
in syntax, or “syntax in LF” where we are forced to do some semantic compu- 
tation in LF using distributional syntactic categories (N, V, A, P) and semantic 
features in them. No extra mechanism is needed if we have combinatory cat- 
egories with limited semantic information, which are kept separately but in 
tight relation to syntactic types. 


4. Making CCG's way through the Dutch impersonal passive 


It is not surprising that the most striking empirical challenges to radical lex- 
icalization arise from semantics, in particular from some semantic criterion 
that can be associated with a class of syntactic objects in seemingly con- 
flicting ways in constructions, such as in unergativity, unaccusativity, and 
telicity. For example Dutch syntax is known to demand from verbs a par- 
ticular choice of telicity in auxiliary selection, and another for passivizibility 
(Zaenen 1993). The potential cooccurrence of these constructions makes the 
problem even more challenging. 
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It should be clear by now that radical lexicalization as a research program 
does not mean an easy way out of such problems, such as assuming for Dutch 
two lexical entries for the same verb, one used for auxiliary selection and the 
other for passivization. Unless there are compelling empirical reasons to have 
distinct entries for the verb, most importantly a difference in word meaning, 
such formal clutter in the lexicon is unacceptable. 

I will summarize the problem from the perspective of construction gram- 
mar of Goldberg (1995), who follows Zaenen (1991). The impersonal passive 
requires atelic verbs and verb phrases: 


(23) a. *Er werd opgestegen. Goldberg (1995: 15) 
“There was taken off.’ 
b. Er werd gelopen. 
“There was run.’ 
c. *?Er werd naar huis gelopen. 
“There was run home.’ Dutch 


A class of adverbs apparently related to atelicity can improve judgments: 


(24) a. Van Schiphol wordt er de hele dag opgestegen. 
‘From Schiphol there is taking off the whole day.’ 
b. Er werd voordurend naar huis gelopen. 
"There was constantly run home.’ Goldberg (1995: 15) 


This aspect seems to contrast with auxiliary selection, which does not 
change depending on the adverb's atelicity, and insists on the verb's telicity 
(atelic verbs select hebben rather than zijn ‘is’): 


(25) a. Hij is opgestegen. Goldberg (1995: 15) 
‘It has taken oft 
b. Hij is dagelijks opgestegen. 
‘It has taken off daily.’ 


Goldberg takes these facts to suggest that the semantics of the impersonal 
passive cannot depend only on the semantics of the lexical items involved— 
particularly verbs. The semantics of the construction itself must play the key 
role. I will sketch a radically lexicalist scenario for the same construction to 
show that this view may be too pessimistic about the combinatory knowledge 
of words and what it can do. My goal is not to carry the analysis to a full 
treatment but to show how radically lexicalist thinking, combined with a 
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combinatory morphemic lexicon (Bozsahin 2002) and the assumption of 
structure in words, can provide a solution to the fragment in (23-25). 

I will assume for simplicity and ignoring other aspects that the impersonal 
passive, the unergative verb and the unaccusative have the following lexical 
syntactic types in Dutch (‘atel’ is an abbreviation for TELIC=-, and ‘tel’ for 
TELIC=+).*? 


(26) -EN := (Sj, ;\NP)\,(Satele i, Ake j NP) 
lop = Satel € i, Ak=atel LN P 
opgesteg:= Stel € i, Ak=tel NP 
naar := (S jee iNNP)/(Saxe jNNP)/NP 
dagelijks := (S je i\NP)/(Sake j\NP) 


Ak (for Aktionsart) is a complex feature including telicity. The feature 
without a label, such as Sig, is VP telicity; it arises from the result type of 
the Dutch VP, i.e. SNNP. The lexical choice of adverbs are also shown, where 
their passing of the verb’s Aktionsart is projective (index j), and their syntac- 
tic choice of VP telicity (index i) is more liberal. The indices are for ease of 
exposition; we can think of them as two different features whose value space 
is that of the feature TELIC. The two-pathway system is implicit in van Hout 
(2000), where she also talks about the event structure of VPs, not just verbs, 
and feature checking of telicity by strong case. 

It is easy to see how (23a-b) follow from these assumptions. The dubious 
nature of (23c) can be explained as well. The first derivation below is illicit, 
and the second derivation goes through. (The projection of Ak is not shown 
to save space; cf. (26). Note that, in (27a), VP telicity blocks the derivation, 
not Ak.) 


(27) a. Er werd naar huis lop -EN 
(Sii AkNNP) /(SAxNNP) Sate NP (SNNP)N (Satele i\NP) 
S tel,Ak=atel W. P 
kkk 


naar huis gelopen := * 


b. Er werd naar huis lop -EN 
(Steak NNP)/(SAXNNP) Sate NP. (SNNP)N (Saee i\NP) 
gelopen := Satel, Akzatel VP 
Stel,Ak=atel NP 


> 
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The difference is whether the passive gets phrasal or lexical scope. Notice 
that we capture the basics of Goldberg's and Zaenen's insight, that the con- 
struction itself brings something extra to the example, by letting the adverbial 
decide the overall telicity rather than the verb, if there is an adverb. Otherwise 
it is the verb. This seems consistent with the observation that these cases are 
restricted to a certain class of adverbials, i.e. to certain heads of adverbs (naar 
is telic, voordurend atelic, etc.) 

The potential derivation for the speakers who marginally allow (23c) de- 
pends on the lexical scope for the passive as shown in (27b). The possibility 
of a phrasal scope however is a forced move in the current state of affairs 
because of (24), where it is needed for telic verbs as shown in (28-29) (atelic 
verbs continue to prefer the lexical scope for the passive). The Ak feature is 
ignored here as it plays no critical role in the derivations. 


(28) Van Schiphol wordt er 
de hele dag opgesteg -EN 
(Sater, ak \NP)/(SAK \NP) Bet NP (Si\NP)\,(Satele i\NP) 
S atel, Ak=tel WW. P 


Satel Ak-tel NP 
Once again the adverb decides the telicity because of its syntactic type, which 
can compose over other adverbs as in the case of (29). This is how the telicity 
induced by naar can be shifted to atelicity by voordurend in CCG. 
(29) Er werd voordurend naar huis lop -EN 
(Sui NNP)/(SNNP) (Sa NNP)/ (SNNP) Sate NP (Si\NP)\,(Satelc i\NP) 
(SS WPJSWP) 


« 


gelopen:=Sate] WP 


Ps 


A atel \N. P 


The determinant role of the adverbials by which they take any VP but 
return telic or atelic VPs depending on their lexical semantics contrasts with 
auxiliary selection, where the lexical type of the auxiliary selects the verb 
class, e.g. telic for zijn and atelic for hebben. It is a domain restriction, e.g. 
(Sj\NP)$x/(Sak=teie i\NP)$; for zijn, which also generalizes over arities. 

Thus zijn and hebben look at the Aktionsart (Ak) projected from the verb, 
whereas the impersonal passive looks at the telicity of the VP with or without 
adverbial modification. Without an adverb, the telicity of the VP arises from 
the telicity of the verb. With the adverb, the telicity of the VP is the telicity of 
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the principal adverb. The Aktionsart of the verb is always projected onto the 
VP as Ak, without the adverb's intervention, and telicity is projected as a part 
of it. All these properties are preserved in (30). van Hout (2000) corroborates 
further for this complex state of affairs which is nevertheless radically lexi- 
calizable, that projecting only the event structure of the verb is not enough. 


(30) a. Hij 
is opgesteg -EN 


(Sj\NP)$x/(Saketele j\NP)$x Stei \NP (SINNP)N (Sae i\NP) 
SAk-tel c (ND : 


Satel € i, Ak-tel € j NP 
‘It has taken off.’ 


b. Hij 
is dagelijks opgesteg -EN 


(Sj\NP)$k/(Sak=tele j\NP)$k (Satel, jNNP)/(Sake jNNP) Ste NP ` (Si\NP)\, 
> (Satele i\NP) 


Satel, Ak-telc j \N. P 


< 


Satele i, Ak-tele j\NP 


oe 


Satele i,Ak-tele j\NP 
‘It has taken off daily.’ 

Here is the case where the verb is atelic, and chooses the other auxiliary. 
This is of course descriptively speaking because as the syntactic types show, 
the auxiliary does the verb-kind selection in the analysis. Notice that the telic 
adverbs cannot stop the auxiliary from seeing the verb’s Aktionsart (Ak) fea- 
ture (31b). They yield ungrammaticality for independent reasons: the telicity 
of the VP. Its interaction or lack of it with the verb’s Aktionsart is resolved 
by radical lexicalization. 


(31) a. John 
heeft de hele nacht lop -EN 


(Sj\NP)$x/ ` (Satel,ak\NP)/(Sax\NP) Sai NNP (Si\NP)\ (Saec i\NP) 
(Sakzatele j\NP)$x 


gelopen :=Satel,Ak=atel W. P 
S atel, Ak-atel \N. P 


A atel, Ak-atel LN P 
*John walked all night." van Hout (2000: 247) 


b. *John 


> 
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heeft in een uur lop -EN 


(Si\NP)$k/ ` (S AKVNP)/ (SAKNNP) Satel WP (Si\NP)\ ,(Satele i\NP) 
(SAk-atele j \NP)$k 


>B « 
S tel, Ak-atel LN P / (S tel, Ak-atel \N. P ) gelopen EN) atel,Ak=atel WW. P 
ok ok ok 


“* John walked in an hour.’ 


One loose end in this preliminary analysis is of course incorporating 
Dutch scrambling into it to see its effects on the impersonal passive’s scope- 
taking, which I leave to further study, as Goldberg, van Hout and Zaenen do. 
With phrasal scope versus lexical scope distinctions, it seems possible to work 
out a projection scenario where any VP material in CCG’s sense is composed 
over as above for the passive, or lexically chosen by it. The phrasal option for 
the passive is not a far-fetched theoretical option either; it is the only possi- 
bility in Welsh, which has a periphrastic passive ($1), and no morphological 
marking on the verb. 

In summary, the auxiliary is the head of auxiliary selection, and the ad- 
verbial is the head of VP telicity if present, otherwise it is the verb, and the 
verb’s telicity always projects. All of these follow from the uniquely lexi- 
calizable syntactic and semantic assumptions about the category of heads in 
Dutch. Notice also that the assumptions of §1 about the passive, that it needs 
to see the thematic structure of the verb, which translates on the syntactic side 
to the LEX constraint on the slash as "vr or "E, is still adhered to in the cat- 
egory of -EN (26) in an indirect way. Its syntactic type is not $-schematized, 
therefore it must take a one-argument predicate, whose thematic role is there- 
fore visible. This seems consistent with Jaeggli’s (1986) insight that passive 
is an external argument absorber. That argument in our Dutch grammar frag- 
ment is the syntactic subject of the unergative or unaccusative verb, due to 
the S\NP domain for EN. It must face a verb because of the ‘\,’ constraint, 
which prevents it from undergoing composition with adjunct NPs and verbs 
in serial verb constructions. Thus all syntactic work is done by the syntactic 
types, rather than morphological types and syntactic types such as in Jaeggli 
(1986). 

We would expect the categories Sei and Sate to arise from lexical seman- 
tics, as these are associated with words (naar, voordurend, gelopen, opgesten 
etc.), and projected onto syntax from them. For telicity to do the syntactic 
work, such features must be reflected on the syntactic types. Just how much 
is projected (and how) is a lexical choice, as predicted by the principle of lex- 
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ical head government (PLHC), such as the verb's Aktionsart and the VP telic- 
ity going their separate ways in Dutch because it is demanded by syntax.*? In 
our case, feature percolation can happen if these conceptual-semantic features 
were made part of the feature space of the semantic objects in a lexical PADS, 
which in turn codetermines the syntactic type. We can presume that this pro- 
cess might take place as qualia (Pustejovsky 1991) or Jackendoff (1997)-style 
lexical dependency structures. The crucial aspect for the present concerns is 
that this is quite a limited interface with conceptual structure, to ultimately 
find its way to the syntactic type, and it can only happen at the lexical level 
since there is no other level. 

Steedman and Baldridge (2011) show that another Construction Grammar 
favorite, the way construction (Goldberg 1995), is similarly radically lexical- 
izable without any need for extra semantics or syntax over and above lexical 
items. The construction is headed by the reflexive his way (or her way etc.): 


(32) a. Harry slept his way through the final exam. 
b. *Harry slept Barry's/her/their way through the final exam. 


They provide a lexical semantics and a syntactic type for it, which I repeat 
below. The participants and their semantics are clear: a lexical verb, a spa- 
tiotemporal property and a subject. 


(33) -his way := ((S\NP3s)/PPioc)\¢(S\NP3s) 
: APAQAy.cause' (iterate' (Py))(result' (Qy)) 


Radical lexicalization and CCG's transparent projection give us narrow 
opportunities to make predictions and to check our lexical assumptions about 
cases where the constructions interact, because nothing can intervene or al- 
ter the projection of features and types onto surface syntax, hence we do not 
need to worry about the degrees of freedom that might be exploited in some 
linking rule or pre- versus postspellout. For example, we can test the lexi- 
calized reflexive constraint above (the ‘\y’ type; note the affix assumption 
in *-his way’, which is the main input to the narrowed slash). Fronting and 
node-raising seems unacceptable: 


(34) a. *His way Harry slept through the final exam. 
b. Harry; slept and Barry; worked his ;/,; way through the final exam. 


Thus we do not need assumptions over and above the lexical items and con- 
stitutive principles of the lexicon (PCTT, PLHG, etc.) to understand the con- 
structions. Construction Grammar's use of argument roles for constructions, 
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in addition to the participant roles of verbs to explain the phenomenon, forces 
one more linking theory into a theory based on mapping principles. Most 
linking theories leak, as the gradual transition of LFG, the most worked-out 
linking theory, to optimality-theoretic syntax has shown. 


5. Computationalism and language acquisition 


Adjacency as the sole basis of all hypotheses about the grammar suggests 
a computationalist scenario for language acquisition. Here also the kind of 
semantics we need is quite shallow, and originally distinct from syntactic 
representation. 

First a point of clarification about the book's perspective on cognitive sci- 
ence. The term computationalism is yet another source of confusion in cogni- 
tive science. There are computational models which are not computationalist, 
and noncomputerized models which are computationalist. Computationalism 
suggests that the aspects that make a problem computationally easy or diffi- 
cult, such as nondeterminism, automata-theoretic resource management, and 
algorithmic space and time complexity, are significant factors in for example 
the child's elimination of her hypothesis space in language acquisition. Ef- 
ficiency of course cannot be the whole story in this endeavor; it will cause 
tension with expressivity as the child grows, and this aspect has to be part of 
a model too. 

The point can be clarified with an example. Suppose that we are trying 
to see the role of homonymy and synonymy in communication. We can start 
with some cognitivist primitives, such as “avoid homonymy" or “disprefer 
synonymy" to model efficient communication. Or we can show through a 
computationalist model that in a group of communicating agents having too 
many homonyms and synonyms cause late convergence to a common vocab- 
ulary. Such experiments have been conducted by Smith (2003), De Beule, 
De Vylder and Belpaeme (2006), Eryilmaz and Bozsahin (2012). The com- 
plexity of the task and complexity of life seem to conspire to constrain the 
behavior, rather than cognitivist assumptions. 

There is another interpretation of computationalism in cognitive science 
and psychology, where it is taken as the agenda of treating symbols as relating 
to the nature of representations, that is, to their encoding in the mind (see 
e.g. Bickhard 1996). Computationalism in the broader sense does not need 
this assumption because computationalist models—whether implemented in 
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a computer or not—are hypotheses about what connects representations to 
solutions, not how they are internalized. This is true of connectionism as well, 
a field which is unfairly left out of computationalism in wholesale by some 
psychologists. Take for example Elman's (1990) modeling of time, in which 
a change of input encoding does reflect on the nature of the problem, yet 
solutions live or die by computational properties. Thus there is no conflict in 
adopting computationalism as a whole, in addition to interactionism Bickhard 
has been advocating.*4 

Let us look at some alternatives to computationalism, for example a cog- 
nitivist treatment of acquisition. It has been argued that nouns are acquired 
first (Gentner 1982). That would be a conceptual bias toward names, objects 
and their perception, hence their first appearance in child language. 


Table 3. Tad’s first words (Gentner 1982) (AmE). 


Age (m.) 

11 dog 16 eye 19 down 

12 duck 18 cow boo 

13 daddy bath bottle 
yuk hot up 
mama cup hi 
teh (teddy bear) truck spoon 
car 19 kitty bye 

14 dipe (diaper) pee pee bowl 
toot toot (horn) happy uh oh 
owl oops towel 

15 keys juice apple 
cheese TV teeth 


For example, Table 3 shows Tad’s first words starting at 11 months. They 
seem to be adult nouns, and whether they are child nouns strictly we have 
so far no way of knowing. For example, keys might also mean open, or dipe, 
clean. Keren’s first words appear to be similarly reinterpretable (Table 4). 

20-22 month-old Mandarin children seem to show no noun-verb bias 
(Tardif 1996). This result and a reinterpretation of the results above might 
suggest a computationalist perspective, first proposed for machine learning 
by Zettlemoyer and Collins (2005), and adopted for languge acquisition by 
Steedman and Hockenmaier (2007), Cóltekin and Bozsahin (2007). 
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Table 4. Keren's first words (Dromi 1987) (Hebrew, Israel). 


Age Child's conven. 


md) word form 

10(12) haw (?) a dog’s bark 

11(16) ?aba (aba) Father 

11(17) ?imaima  (?) 

11(18) ham (?) said while eating 

12(3) mu (?) a cow’s moo 

12(3) ?ia (?) a donkey’s bray 

12(8) pil (pil) an elephant 

12(11) buba (buba) a doll 

12(13) pipi (pipi) urine 

12(16) hita (?) going out for a walk 

12(18) tiktak (?) sound of clock 

12019)  cifcif (?) bird’s tweet 

12(20) hupa (?) accom. making sudden 
contact w/ground 

12(23) dio (dio) giddi up 

12(25) hine (hine) here 

12(25) jem (?ein) all gone 

12(25)  na?al (na?al) a shoe 

12(25) myau (?) a cat's meow 


If we take the problem of language acquisition as manifesting a continu- 
ous problem space for words and phrases, and if we assume that the hidden 
variable in the task is the syntactic category to be learned, whereas the ob- 
servables are a phonological form and the model world, crucially not PADS 
or a logical form, then we would expect the child to start off with some prior 
probabilities on invariants of combination, and proceed as she manages to 
combine rightly or wrongly what she hears as syntactic categories to pair 
with predicate-argument structures. 

For example, upon hearing eat your veggies, the child might think eat 
means eat,’ veg,’ or even dog’ if there is a dog around when the sentence 
was uttered. Limited possibilities of combination in CCG, and a conservative 
understanding of tracking the world (e.g. Siskind 1995, 1996), will sieve most 
of the wrong assumptions as the child experiences more episodes with eating, 
dogs and vegetables, eliminating e.g. the hypotheses N: eat’ and S\NP: dog’. 
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An algorithm is provided for this task by Steedman and Hockenmaier 
(2007). My running example of dogs, eating and veggies is fashioned after 
theirs. The setup is common to all CCG learners, which dates back to Gold's 
(1967) text model: start with an empty lexicon. For each experience, generate 
some hypotheses that lead to its successful parse, and update the lexicon. Re- 
peat with the new lexicon. In retrospect, the lexicon will have covered all the 
strings the learner has experienced, where "something more" in the Humean 
sense is also learned to cover things beyond a token of experience: the syn- 
tactic type as the hypothesis. 

It is crucial that what is learned is a syntactic type. In a way it symbol- 
izes the transfer of experience-specific knowledge to reusable knowledge, or 
perhaps impressions to ideas, to use a more familiar Hume terminology. For 
example, we can conceive that the passive is learned by exposure, but once 
learned, it applies to all argument-taking objects of the right sort because ac- 
quiring the passive means obtaining a syntactic type for it, which is relevant 
to verbs of similar type. 

The working principle here is that the CCG learner collects personal his- 
torical information about derivations of strings—i.e. rule and word use—in 
the parse-to-learn paradigm, either by adjusting the model parameters (log- 
linear models), or by updating its trust on categories (Bayesian models), in the 
manner described by Zettlemoyer and Collins (2005), Steedman and Hock- 
enmaier (2007), Cóltekin and Bozsahin (2007), Clark and Curran (2007).5 

That is, its task is to estimate P(c|e), either by discriminative (log-linear) 
models or generative models, where c is a syntactic type and e is the evidence 
for itin the form of (PE, PADS) pairs, calculated for example by Bayes's rule: 
P(c)P(e|c) 

P(e) 

The prior probability P(c) is selected by the learner's history in what she per- 
ceives or (rightly or wrongly) understands; it is her current lexicon's syntactic 
distribution. This is not only constrained by experience; universal constraints 
filter out some impossible configurations as well. 

The Bayesian model sketched so far is not incremental. To estimate the 
conditional probability P(E:—e | C:—c), we need to find out which parses 
using the current lexicon and the newly introduced hypotheses give us c, and 
among them the probability P(E:—e). Some of the earlier experiences will be 
related to c as well, hence the need to reparse them to get P(C:—c). 


(35) P(cle) = 
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Even if we assume that each experience is unique, its subparts are most 
likely not all unique (otherwise learning would be very hard if not impos- 
sible), therefore subparts of e and several c's must be considered for each 
experience. For example, eat'veg! might be a new experience when eat’ and 
veg’ are not, such as encountering don't eat all the cookies and I like veggies 
before. Zettlemoyer and Collins (2005) use a limited category inventory in 
lieu of universal grammar to constrain the possibilities of new categories for 
the new experience, and Steedman and Hockenmaier (2007), Cóltekin and 
Bozsahin (2007) rely on universal principles such as those in §5.2 and Chap- 
ter 7. 

We need an iterative method which parses the current experience only with 
the help of the current lexicon—the grammar—and the new hypotheses. For 
example, we can take a weight w to be the learner’s belief that her hypothe- 
sis about a certain category is correct. The following oversimplified formula 
from Cóltekin and Bozsahin (2007) is one example of hypothesis revision. 
(Log-linear models such as that of Zettlemoyer and Collins 2007 use easily 
discernible features of parse trees, e.g. number of lexical entries and number 
of applications of a rule, which takes into account the current lexicon and rule 
use.) 


(36 w= wo(1 + ap(1 —wo)) 


wo is the probability (or weight) of the lexical hypothesis c before seeing the 
input e. If the hypothesis is already in the lexicon, wo is the weight of the hy- 
pothesis in the current lexicon, otherwise an arbitrary initial value is assigned. 
New hypotheses can be added although substrings of the current experience 
have already been seen. For example, if the child thinks eat:= NP:veg' and 
veggies:-SNNP:eat' somehow, and the new experience is no veggies, we can 
produce no:=S/NP:no’ and veggies:=NP:eat,’ meaning ‘no eating’. 

The constant o in (36) is the learning rate, which must be part of an ex- 
perimenter's toolbox. We can assume for the child that it improves with expe- 
rience. The f in the formula is the learner's new evidence that the category c 
might help to understand the new experiences. It is calculated as the number 
of parses of e in which hypothesis c is used, divided by the total number of 
parses of the experience e DD It gives new support for the category c provided 
by e. The higher the number of parses that the hypothesis supports, the higher 
the support value will be. If the hypothesis is used by all the possible parses 
of the input, the value is 1. The value gets smaller due to the parses that do 
not include the hypothesis. The final term in the formula, 1 — wo, normalizes 
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the result so that the new weight is in the range (0,1]. The final weight is 
increased with a value directly proportional to the new trust on c, as shown 
in (36). 

This is inspired by Bayesian hypothesis revision but it is not strictly 
Bayesian. Firstly, the implicit assumption is that there is no negative evi- 
dence, as the probabilities do not decrease. One can see no increase in the 
weight of a hypothesis as less belief in it, compared to its alternatives whose 
weight increases. The problem can be alleviated if we can fit a distribution 
for P(e) in (35), but this is rather difficult if not impossible. 

Secondly, the model has no grounds to distinguish infrequent but correct 
hypotheses from incorrect but frequent ones. In the first case, the belief in a 
hypothesis would not increase much, and in the second case, it will continue 
to increase, albeit slowly. 

This weakness is required empirically, because the child is assumed to 
operate in what Gold (1967) called the "text" model, where there is no de- 
cider for any experience e whether a hypothesis about it is right or wrong. (A 
rationalist model for example could take this as a sign that the functional cate- 
gories are innate, because their overt manifestation is infrequent in early child 
speech.) From an empiricist perspective, especially with the narrow under- 
standing of computationalism adhered to in this work, incorrect but frequent 
hypotheses (categories) are bonafide members of the lexicalized grammar of 
the child, and infrequent but correct hypotheses need more time to materialize 
in a parse-to-learn paradigm. 

The computationalist twist in such models is that only contiguous sub- 
strings (including the substrings of words discussed in Cóltekin and Bozsahin 
2007) are allowed to bear types, therefore to carry a meaning, and short 
strings are considered more feasible because the algorithms must consider 
all such possible pairs, i.e. the powerset of possible PF-PADS mappings, so 
that we can be sure the child in the end can potentially manage to bring the 
correct pairing to the fore through experience. Such algorithms will show a 
bias toward frequent, short or unambiguous strings because these aspects can 
be shown to ease the task computationally. For example, the powerset con- 
struction is exponential on the size of the set, which is the set of hypotheses. 
Only small values are feasible in a learning model, and the contiguity as- 
sumption is a simple way of reducing it from O(2") to O(n”). I repeat Garey 
and Johnson's (1979) numbers for differences in growth rates of functions 
as Table 5 (each unit operation is assumed to take one microsecond). Any n 
greater than 5 can tell us how these reductions in problem size can play a role. 
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Table 5. Growth rates of some polynomial and exponential functions, from Garey 
and Johnson (1979: Fig.1.2) 


Time sizen 
complexity 10 20 30 40 50 60 
function 
n .00001 ` .00002 .00003 .00004 .00005 .00006 
second second second second second second 
n? .0001 .0004 .0009 .0016 .0025 .0036 
second second second second second second 
n? .001 .008 .027 .064 .125 .216 
second second second second second second 
n? WI 3.2 24.3 1.7 5.2 13.0 
second seconds seconds minutes minutes minutes 
2^ .001 1.0 17.9 12.7 35.7 366 
second second minutes days years centuries 
3" .059 58 6.5 3855 2x105  13x105 


second minutes years centuries centuries centuries 


The computationalist model is falsifiable. The computationalist assump- 
tions would be wrong if we can show that the length of the strings, their 
ambiguity and their frequency do not play a key role. For example, a nouns- 
first cognitivist theory can show one of the following to refute the computa- 
tionalist assumptions: (a) some short verbs are not learned early even when 
they are frequent and unambiguous, (b) some frequently-used long nouns can 
be learned early, (c) infrequent but short nouns can be learned early, and (d) 
some ambiguous but short nouns can be learned early. In all these cases, some 
strong computationalist assumption would be at risk. 

The computationalist view suggests that we take another look at the re- 
sults. For example, for both Tad and Keren, long words seem to be rhythmic 
repetitions, i.e. they engender no ambiguity as the string becomes longer. 
Short nouns can be child verbs too. 

Early acquisition of verbs seems possible (Brown 1998). Interestingly, the 
verbs that Tzeltal children acquire early seem to be argument-specific there- 
fore less ambiguous than opaque verbs. For example, eating tortillas, eating 
beans and eating in general (as in a question) are different words in Tzeltal. 
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Some early-acquired verbs such as those for go, make, come are not 
argument-specific, but they are the most frequent verbs in the language. 

Brown is not suggesting a verbs-first alternative to the nouns-first proposal 
based on these findings. She shows that the amount of nouns and verbs pro- 
duced from the early one-word stage and prevocabulary explosion are more 
or less the same. This is what we would expect when verbs are specific and/or 
frequent, and nouns and verbs are equally rich in morphology, as in Tzeltal. 

Computationalist models are possible only if we start with the assump- 
tion that the child has access to some semantics, not just to meanings out 
there but to some hypothesis about what she thinks they mean, that is, an 
access to a PADS.?7 The environment and what she hears from it might be 
related to that semantics because her attention is directed by adults when she 
is spoken to. Evaluating the hypotheses of PF-PADS pairs is feasible if we as- 
sume adjacency. With empty categories or with syntactic assumptions on the 
child's understanding (e.g. S, VP etc.) rather than semantic ones, the number 
of hypotheses to consider would be prohibitive. One such proposal, which 
seems only apparently congenial to computationalism, is Hawkins's (1994) 
processing-based account of establishing the basic word orders in languages. 
In his model, as well as in Kayne's (1994) where movement and empty cate- 
gories are bound to come up for consideration at every step of processing, the 
number of possibilities for a parser to consider in the parse-to-learn paradigm 
is quite unconstrained. 


6. Stumbling on to knowledge of words 


The process described in the previous section gives us a recipe to devise ex- 
plicit tokens of knowledge representation for the child's potential hypotheses 
about the words. Their statistical nature might raise doubts about whether 
this way of thinking can live up to the task of explaining why one-word and 
two-word stages of children, and the vocabulary explosion that follows soon 
afterwards, more or less appear around the same time for most children. The 
first thing to note about this doubt is that no-one claims children start tab- 
ula rasa; the task-specific knowledge, namely the lexicalized syntactic type, 
must have severe constraints on its distribution. This is the task of CCG as 
a linguistic theory, in lieu of a biologically determined universal grammar in 
generativism. Secondly, now that we can radically lexicalize all the rules of 
any natural grammar, that is, we have only the knowledge of words to work 
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with in hypothesizing, we must show what the experience can do to the rules 
in Shimon Edelman's sense, and how. In such experiments we are reminded 
of the opening words in his personal web site: “rationalists do it by the rules, 
empiricists do it to the rules.” In a radically lexicalized combinatory grammar, 
à word’s category is the grammar rule because it is an intensional recipe. 

This section presents a thought experiment about how a fairly intuitive no- 
tion of word as a grammatical-historical object can be read off from the lex- 
icon. Radical lexicalization and the experiential-semantic understanding of 
"standing on its own in a string" appear to be sufficient for this process. The 
experiment is inspired by computational language learning in the manner of 
Zettlemoyer and Collins (2005), Steedman and Hockenmaier (2007), which 
are inspired by cross-situational learning of Siskind (1995, 1996) and CCG, 
which led to similarly inspired computational models of learning string- 
meaning correspondences (e.g. Villavicencio 2002, Bos et al. 2004, Steedman 
2005a, Fazly, Alishahi and Stevenson 2010, Kwiatkowksi et al. 2010, 2011), 
all of which go back in spirit to late- Wittgenstein (1942), Quine (1960) and 
Gibson (1966). 

The difference of the present experiment from these works is that they 
presume the notion of word and suggest a model of how their meanings may 
arise from use. I will try to suggest a thought experiment about how words 
may arise in the first place. My starting point is to assume that children can 
detect patterns in phonological strings. We can take these patterns to be child 
morphemes, but we need not start with the morpheme. In a related study, 
Cóltekin and Bozsahin (2007) showed that if we start with syllables (i.e. if 
only syllables are assumed to be discernible by the child), and run a scenario 
similar to Zettlemoyer and Collins (2005) on the Turkish fragment of the 
CHILDES database (McWhinnie 2000), we get 71% of the emerging lexical 
items (including bound forms) coincide with that of a model which starts with 
morphemes, in 24,000 nouns, out of which 56% are inflected. Their syllable 
model does not make assumptions about root/stemhood, hence we can ex- 
pect more alignments if we incorporate some prosodic cues about uninflected 
words, which comprise 4446 of the database (Jusczyk, Hohne and Newsome 
1999, 'Thiessen and Saffran 2003 suggest that these cues are at work at very 
early stages). This is not a bad start to give rise to meanings of things smaller 
than words. 

Consider the word veggies. One criterion of Di Sciullo and Williams 
(1987) for wordhood in the currently discussed sense is that words are more 
generic than phrases. We have no reason to assume that at the first hearing 
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of this word it would be generic to the child. Assume that the child has 
gone through a Quinean series of hypothesis forming where many hypotheses 
(most of which might be wrong) have been entertained, much like in Siskind 
(1996).53 

For example, we can assume that the experience (37) might produce the 
correct hypotheses in (38a/a’), as well as those in (38b-c), which are the sit- 
uations in which the string eat is not understood as the verb, but the overall 
experience still spells some kind of predication, simply indicated here by the 
overall result of S. (38d) is another potential set, in which eat’s category is 
correct, but veggie and -s are off the mark. We can take (38) to be delivered 
by a parsed-to-learn paradigm of acquisition. 


(37) Eat veggies. 


(38) a. eat:-S/NP:eat' ` veggies:-NP:veg' 
a’. eat:=S/NP-eat! — veggie:-NP:veg! -s :=NP\NP:plu' 
b. eat:=NP-eat!  veggies:=S\NP: Ax.veg'x 
c. eat:=NP:veg' |^ veggies:-SNNP: Àx.eat'x 
d. eat:zS/NP:eat' ^ veggie:-NP/NP:plu -s := NP:veg' 


This experience cannot lead to the hypotheses in (39a-c) because no com- 
binator in syntax can combine them to produce a rightly or wrongly inter- 
pretable experience. The distribution of syntactic types S, NP/NP, S/NP etc. 
are therefore most likely skewed. 


(39) a. *eat:2NP:eat!' — veggies:2 S/NP: veg' 
b. *eat:=S\NP-eat' — veggies:- NP: veg’ 
c. *eat:-SNNP:eat! — veggie:2NP:veg! -s :=NP\NP:plu' 


Note also that a predicate-argument structure is part of the child's hypothesis 
space; it is not the extensional world. For brevity I denoted it with primes. 
We do not start with the assumption that the child knows veggies are veggies, 
where the only unknown would be whether they are Ns or Vs in syntax. Both 
are acquired. 

Now consider a second experience, say (40). 


(40) No veggies. 


This will create more hypotheses about veggies. Let us also take into ac- 
count the nonlinguistic surrounding in the manner of Siskind (1995, 1996), 
and assume that there is a chocolate bar around when this sentence is uttered. 
The child might think that veggies can mean negation (because of no), or that 
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it could mean chocolate, veggies, or eating (the last one comes from the pre- 
vious experience). We must also allow for the possibility that she might think 
"veggies" could mean the noun veggies, or that it could be a verb. Hence as- 
suming veggies are veggies would be an oversimplification; both syntactic 
options must be entertained even if we assume that she has got the string- 
content correspondence right. 

Even in this circumscribed world of two experiences only, the child is ex- 
ponentially less likely to believe that veggies could mean negation, eating, 
plural or chocolate, rather than veggies. The sum of 43 hypotheses is calcu- 


lated as follows.9? 
(41) Experience 1 (Eat veggies) 
eat :=S/NP:eat' veggies :=S\NP:veg’ veggie :-NP — :veg!  -s:-NPNNP:plu! 
:veg! :eat' NP/NP:plu' NP — weg 
NP :eat' ;plu' veg’ :veg' 
:veg' :plu' eat! 
NP veg 
:eat’ 
:plu' veg’ 
:plu' eat! 
Experience 2 (No veggies; with chocolate) 
no :=S/NP:no! veggies :=S\NP:no! veggie :=NP ` mol -s :=NP\NP:plu! 
veg! :veg! :veg! NP — weg 
:choc! :choc! :choc! :choc! 
:eat’ NP/NP:plu 
;plu' veg’ :veg' 
:plu' choc! :choc! 
:plu'no! 
NP veg 
:eat' 
:no! 
:choc! 
;plu' veg’ 
:plu' choc! 
:plu'no! 


5 percent of the possibilities, out of a total of 43 chosen above, can re- 


late the string veggies to veg’ as a noun or verb. In contrast, the likelihood of 


no meaning veg is i. the plural A. If we keep a local statistic rather than 


a global one, there would be a set of 36 hypotheses about the set of forms 
(veggie, -s), and 1 percent of it would relate them to veg.’ The total per- 
centage of associations where the string veggies does not include veg' is 2. 
That seems high, but it covers four meanings (plural, negation, eat and choco- 
late) and four types, which are SUP. NP, NP/NP and NP\NP. By Siskind's 


(1996) cross-situational inference, and by CCG's fully lexicalized syntactic 
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types, the likelihood of veggies covering one of these type-meaning corre- 
spondences is severely less than the veggies := veg’ connection. I ignore here 
how the plural can come to be associated with veg’ using these assumptions 
in parsing. For example veggies can be parsed from veggie — NP/NP:plu' 
and -s := NP:veg', where both hypotheses are wrong but they yield the in- 
tended interpretation veggies := NP:plu'veg'; see Steedman and Hockenmaier 
(2007), Zettlemoyer and Collins (2005). 
Let us add another experience, (42). 


(42) Veggies gone. 


Before this experience, È 14 percent of the M icm = veg’ e a con- 
sidered this relation to be mediated by NP, 4 14 by S\NP, and 4 q by NP/NP. 
The new experience can bring in the hypotheses in (43) (for suplicity I as- 
sume no other factors). 


(43) veggies :=S/NP:veg! gone :=S\NP:veg’ veggie :=NP ` :veg! -s :=NP\NP:plu' 
:gone’ :gone! sno! NP — weg 
:eat' NP veg NP/NP:veg' S/NP :gone 
:no' :gone' :plu' :veg' 
:plu' veg! BN 

NP veg 

igone! 
:eat’ 
:no! 
:plu' gone! 


This time we fortuitously help the child to discern the noun versus verb 
hypotheses of veggies, but we make plu’ slightly more susceptible because it 
has more opportunities for combination to the left and right. (To be sure, there 
are more hypotheses in this three-scene experience, and some of the hypothe- 
ses considered are not hypotheses in the parsimonious model of Siskind; my 
purpose here is to construe a baseline case by making things bad enough for 
the experiment.) 

With the addition of seven more { veggies, veggie, -s} := veg’ hypotheses to 
the previous 14, ie child is + SE to believe the connection is mediated by 
NP, 5 + by S\NP, 2 3, by S/NP, and 3 ; by NP/NP, in just three scenes. We can 
assume that a language model, in the sense the term is used in computational 
linguistics, i.e. as a model to pick some product of probabilities in a parse- 
to-learn paradigm, will favor the type with higher probability as the primary 
representative of the word in grammar. 

The NP hypothesis for the word E is the top contender after these 


three experiences, with a total frequency of 22 $5, in which the correct relations, 


Stumbling on to knowledge of words 161 


{veggies, veggie} := { SNNP:veg MES S\NP:eat! Q Z 55 S\NP:no' Q + 55> 
S\NP:choc' @ d. s» SANP:plu' veg! @ 2. 5 S\NP:plu'eat' & 1. 35> 
S\NP:plu'no' @ & ss» S\NP:plu'choc! Ql 
S/NP:veg (Qd, S/NP:gone' Q 2. 


55° 


35° S/NP:eat' @ x, 


S/NP:no' @ $% i S/NP:plu @ x, a Ges L, 
NP:veg GR 55> NP:eat Q2. NP:plu' veg (o 2 35° 
NP:plu ‘eat @ 4 35° NP:plu' gone’ @ 5 NP:plu' choc! @ 4 55> 
NP:plu'no' @ 1. 35° NP:no' @ 4, NP:choc' @d, 

NP: gone GR 


NPXNNP:plu' 3 3 
NP/NP:plu' @ 3. zs, NP/NP:veg '@ 2 
} 


Figure 8. The total set of hypotheses about the word veggies after three hypothetical 
scenes. 


sp — NP/NP:choc' @ zz 


veggies:=NP:veg' and veggies:=NP:plu'veg', rank highest, 1 ES which are ex- 
ponentially higher than almost all others. (Figure 8 is the source of these 
numbers.) 

The plural is A likely to mean plu,’ which outranks all other alternatives 
except veg.’ More experiences with the plural will give more diminishing 
returns for assumptions other than plu.' 

More important to our present concern is plu’, which is $ likely to arise 
from -s, which outranks its competitors except plu'veg', which is i per- 
cent likely. The outranking hypothesis is associated with the word veggies. 
Together they embody a cross-situational parsed-to-learn understanding of 
the set { veggies, -s}, along with syntactic types. 75% of -s: plu’ experiences 
are mediated by NP\NP. The plural’s possible connections to the hypotheses 
about eat in the first experience and no in the second one can only be indirect, 
that is, through some wrong assumptions about these words that they meant 
veg,’ because otherwise they cannot be adjacent to plu'-assumed words. Its 
link to gone in the third experience is more direct because they are adjacent. 
This can be observed in -s types of (43). Its relation to the hypotheses about 
veggies is more involved, as can be seen from (41) and (43). 

Once the NP hypothesis about veggies begins to win out, a Humean 
generalization of “something more than the experience” can be assumed to 
take place, where the winning strategy of typing plu’ as NP\NP and call- 
ing veggie-like things NP can come together in parsing other strings such 
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as birds and doggies. The types, in other words, conspire to relate certain 
bound meanings with certain free meanings once we have sufficient confi- 
dence in them. The other types for the plural and the noun would not be 
so successful across experiences. They are not winning strategies. This result 
comes from the interaction of Siskind's cross-situational inference and cover- 
ing constraints, where the former sieves out the hypotheses by the intersection 
of scene meanings, and the latter eliminates some hypotheses by assuming 
that all hypotheses of an experience must be derived from the meanings of 
the words in an experience (we have somewhat relaxed this assumption but 
not much; in experience two there is no word for chocolate but some words 
were assumed to mean chocolate.) 

Now we can be quite explicit about the form and substance of the lin- 
guistic knowledge of words: it is the set of categories it can bear, along with 
the owner's trust on the members of the set, acquired by the parse-to-learn 
paradigm. (Keep in mind that, for the purposes of this book, we ignore the as- 
pects of morphology and inflection, such as veggie versus veggies, hence this 
is only a first approximation). The collection of such knowledge comprises 
an individual's grammar. For the hypothetical child above in particular, the 
collection might contain the fragment exemplified in Figure 8. 

Notice that the knowledge of the child's word experience is complete. 
(This is a requirement for a computational model of the process, that the cor- 
rect solution be on the search path even if it is not very likely at the beginning, 
since we know that every child converges on the competent use of a word af- 
ter experience.) Her sums add up to 1, for both veggies:=veg’ relation and for 
the possible categories of the word veggies. In this circumscribed and delib- 
erately simplified world, the NP hypothesis for this word is the top contender 
after these three experiences, with a total frequency of i in which the correct 
relation, veggies:2NP:veg,' ranks highest, rA 

One attempt to reduce the possible substantive categories in the search 
space of acquisition is the theory of functional categories, to which we now 
turn. The point of semantics in their case is that we do not need yet another 
innate source of knowledge for words, because although their semantics seem 
robust across languages, they are quite predictable as lexical items. 
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It is common practice in transformationalism to distinguish substantive cate- 
gories such as V(erb), N(oun), A(djective) and P(reposition), from functional 
categories, such as C(omplementizer), I(nflection) and D(eterminer), among 
others. As the distinction has no place in radical lexicalism, one might won- 
der whether functional categories are quirky syntactic objects or arise from 
semantic dependencies. 

The first thing to note about them is that they have a parasitic life. They 
depend on substantive categories. A determiner phrase (DP) needs a noun 
phrase, an inflectional phrase (IP) needs a tensed domain like root sentences, 
a complementizer phrase (CP) needs a clause, etc. We can narrow down our 
question to (i) whether these dependencies need combinators, and (ii) why 
they materialize in more or less the same way across languages when they 
manifest themselves. 

Let us start with the last question first. Szabolcsi (1994) establishes the se- 
mantic bond across some apparently distinct functional syntactic items. Her 
subordinators are generalizations of nominal elements such as the article, the 
determiner and the verbal ones such as the complementizer. Their common 
function is to make the predicate or the nominal an argument of another pred- 
icate. For the nominal domain, say for the article, value-raising the article to 
take a noun and look for predicates looking for such arguments is a way to 
capture this behavior. Value-raised categories are those in which the result 
type (value) is a type-raised category, for example (S/(S\NP))/N, rather than 
NP/N. On the semantic side it is accompanied by distributing type raising 
to the arguments, for example APA Q.(Vx)imply (Px)(Qx) for the quantifier 
every. 

For the complementizer, it is usually the identity function, AP.P. The dif- 
ference seems natural without the need of a universal. Nominals are proper- 
ties and arguments, whereas predicates as arguments do not engender another 
predication. There would be nothing over which value-raising could operate 
and distribute type-raising. We shall see that once we translate functional 
distributional categories to combinatory ones, they have nonvacuous but se- 
mantically transparent functions such as APP 

Regarding the first question, whether we need combinators for functional 
categories, we can start with the original motivation for positing functional 
categories: the substantive-functional distinction is meant to capture lexical- 
universal structures. Functional projections, as the theory goes, always bind 
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the substantive phrase in the same way, whereas the relations within a sub- 
stantive phrase can be language-particular. 

Grimshaw (2000) is a summary of the developments and the universal 
claims about functional categories (see also Pollock 1989, Haegeman 1998 
for more functional categories). Her formulation is a good starting point to 
see the possible dependencies, and we can assume a version of it to be part 
of a meta-theory for predicting possible lexical category-feature mappings in 
CCG. (CCG would be overextending itself to cover cases where the config- 
urations are not syntacticized by combinatory dependencies. In this sense, it 
needs meta-theories such as this and for example autosegmental phonology. 
But we must first be sure that the dependencies are not universal but lexical.) 

Among the possible projections Grimshaw reports, the one in (44) is per- 
haps the most expected, which summarizes the motivation for the idea of 
distinguishing functional heads (C, D, I) from lexical heads (N, V, A, P). In 
the text the C-IP head-complement relation is bracketed as [C IP |cp. 


(44) CP 


Other possible head-complement configurations according to her are C-VP, 
P-DP and P-NP. The impossibility or oddity of some of the configurations 
according to Grimshaw, such as I-DP, V-IP, D-DP, C-VP, I-NP arise from 
her theory of projecting lexical heads only under the guidance of functional 
heads. 

VP is a lexical projection in (45a) whereas IP is a functional projection 
dominated by it, which is considered illicit. This is impossible according to 
Grimshaw because of the functional mismatch in VP and IP, although there 
is a categorial match between V and IP, say as [+V -N]. (45b)’s violation 
is considered less severe because there is no categorial match in V and DP, 
hence an ambiguous extended projection is expected. 


(45) a. *VP b. ?VP 


OS p MS 
V P V DP 
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From the perspective of heads, their combinatory categories can be given 
the following first approximation in association with (44). 


(46) CP 
C-CP/IP IP 


I-IP/ VP VP 


V=VP/DP DP 


ee OS 
D=DP/NP NP 


Take the category of C, viz. CP/IP. To say that IP is an inflectional pro- 
jection (e.g. Sgn) is to categorize the complementizer as $'/Sg,, as we have so 
far assumed for example for that, as in I think that she likes me, rather than 
S’/S. A category such as S/Sg, does not capture CP/IP either. IP cannot be 
an agreement domain typewise because, in the domain of locality of C, that 
is, in its lexical category CP/IP, such as S / S8, there is no argument to agree 
with. (Structure-wise it can have an agreement element in it such as INFL, in 
theories that posit functional categories. Agreement as a type domain cannot 
rely on this property. Types are string properties, not tree properties.) Seman- 
tically the complementizer translates to A P.P since there are no arguments or 
predicates whose dependencies must be heeded. 

Consider now the category of I in (46), IP/VP. Positing this category is 
the same as saying that all arguments are type-raised in a competence gram- 
mar, either lexically or by a lexical rule, so that categories onto IP must heed 
agreement, for subject-agreement languages. (Note that this is not a universal, 
e.g. Chinese). 

We can then follow the influential proposal of George and Kornfilt (1981) 
to take finiteness as a corollary of agreement, for both verbs and nouns; 
see Kornfilt (1984), Abney (1987). For English it means she in she likes 
chocolate bears the category S/(SNNP3,), not just S/(S\NP), and likes bears 
(Sfin NP3,) /NP, not just (S\NP3s)/NP or (SNNP)/NP. 

It also means that ber in she likes her must not bear such decorations 
although it carries morphologically the number and person, e.g. S\(S/NP). 
Notice that S\NP is VP whereas S/NP is not, thus what we have captured 
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lexically is the essence of /P/VP. The agreement domain in English is SNNP. 
(For Welsh, which is strictly VSO, the difference in agreement domains and 
others can be accounted for by SW S /NP3,) for subject and S\(S/NP) for non- 
subject third-person NPs.) The semantics of the process involves no freely- 
operating combinator; it is the semantics of lexical T, for example Joe := 
NP3,: joe’ + S/(S\NP35): AP.Pjoe.’ 

Now consider the substantive category V in (46). It gets a functional in- 
terpretation in structure-dependent theories because of its licit configuration 
[V-DP]yp, which we could translate as V=VP/DP. In CCG, it amounts to 
saying that the DP is a nonagreeing argument because VP is the domain of 
agreement, not DP, which we can capture as (S\NPagr)/NP in V’s category 
for English. (For Welsh, the category is (S/NP)/NPagr because the first NP is 
the subject.) 

The mutual dependence of VP and IP on V in distributional-category the- 
ories is captured in combinatory categories by the fact that all the arguments 
are type-raised, and they can differ in agreement. Thus the V-DP configura- 
tion turns out to be a lexical category, viz. (S\NPagr)/NP for English. As V 
is a substantive category in everybody’s theory, it follows that its category is 
not universal, for example (S\NP)/NP: AxAy.read'xy for the SVO English 
and (S/NP)/NP: AxAy.read'yx for the VSO Welsh. 

Finally, let us consider the functional category D in (46), which translates 
to DP/NP. This conception of NP must be headed by an N rather than a 
determiner. Thus we have DP/N in categorial terms. Considered together 
with the DP category mentioned earlier, the DP/NP assumption amounts to 
saying that all determiners, including quantifiers and names, are type-raised 
or value-raised, since DP necessarily functions as an argument (the N-DP 
configuration is illicit in functional projection theories as well). 

The idea has been around since Russell and Montague (1973) as the theory 
of generalized quantifiers. For example, the categories in (47a) handle (47b), 
where determiner- and name value-raising (and concomitant differences in 
agreement) also handle (47c—e) (assuming Kafka is a name, not a property). 
These are shown in (48). 


(47) a. every := (S /(S\NP3s))/N3s: APAQ.(Vx)imply (Px)(Qx) 
every := (SNS/NP)) /Ns,: APAQ.(Vx)imply (Px)(Qx) 
Kafka := S/(SNNP3,): ÀAP.Pkafka' 
Kafka := SNS/NP): AP.Pkafka' 
b. Every chemist loves Kafka. 
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c. *Every chemists love/loves Kafka. 
d. Kafka loves/*love every chemist. 
e. *every Kafka 


(48) a. Every chemist loves Kafka 
(S/(SNNP3))/Nss N ` (Sas NNPs,)/NP S\(S/NP) 
S/(SNNP3,) 
S/NP = 
$ < 
b. *every Kafka *every Kafka 
(S/(S\NP35))/N3s NP (S/(S\NP35))/N3s S/(S\NP) 


As expected, the semantics of D cannot be due to a syntactically operating 
combinator. (Note that x and kafka’ in (47) are not syntactic variables.) The 
differences all lie within the lexical syntactic type restrictions. 

Let us now consider some of the impossible configurations which 
the functional-category theory rules out by purely formal means. Take 
Grimshaw's I-NP and D-DP. Assume an as yet undetermined projection for 
I-NP, say XP. We could categorize I as XP/NP. To be faithful to the seman- 
tics of inflection, which ‘I’ stands for, we must obtain an agreement range. 
No type for XP can deliver this interpretation. Take XP=S. Then S would not 
be an agreement range. Take Sagr for XP. Then the XP of XP/NP must be IP, 
but the JP domain requires type raising of all arguments, and /P/NP, which 
would be Sagr/NP in the current assumption, would not be type raising. 

Now consider D-DP, where D=XP/DP. Since D=DP/NP is possible, we 
get XP=DP and DP=NP. The last one is the standard assumption in CCG. 
But XP-DP would predict overquantification because D=XP/DP=DP/DP. 
The structural equivalent of this assumption would be [D| D NP ]pp |pp. This 
assumption, XP=DP, cannot capture the semantics of quantification because 
there would be no discernible head for DP. 

In summary, what is called a functional category is in essence (1) a syn- 
tactic restriction on grammatical meanings which narrows down the com- 
positional meanings that must be delivered by a competence grammar, and 
(ii) a faithful reflection of semantic headness on syntactic types. Functional 
categories need not be ordained as special combinatory rules, or special cat- 
egories, because they do not engender semantic dependencies that must be 
captured by a syntactic combinator. Thus there is nothing special about them 
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that a lexical category cannot handle; they all belong to the lexicon. They are 
special in the sense that they form a closed set, for example, every language 
seems to have a universal quantifier, a finite set of determiners (maybe none), 
a small set of complementizers, a fixed inventory of case markers etc. But 
adpositions and pronouns form a closed class as well, hence this is not their 
definitive feature. 

The choice of a basic category inventory including the functional ones in- 
teracts with accounts of constituency. For example, if complement clauses are 
S rather than S’, we would be hard pressed to eliminate (49a) while account- 
ing for (49b).”° 


(49) a. *[I think that Harry] S j(sNNp) and [Barry |s (re NP) like Mary. 
b. [I think that Harry] and | Barry thinks that Mary] owns the house. 


A combinatory theory would be overextending itself if it chooses to elimi- 
nate (49a) by some combinatory restriction. The problem does not arise from 
the category of that, which is already onto S’, typically assumed to be S’/S or 
ST /S'. It is the category of Harry likes Mary as a complement clause, which, 
as S, leads to the problem above. If we can type-raise the embedded subject 
Harry as ST /(S'NNP), the problem disappears because the conjuncts in (49a) 
would not be like-typed for that interpretation: 


(50) I think that Harry 
S/(SNNP) (SNNP)/S' ST /S' S' /(S'NNP) 
S/(S'\NP) 


Then we have to find an empirical justification for typing the subordinate 
verbs to be onto S” rather than S, e.g. (S'NNP)/NP for like and owns above. 
The syntactic aspect of the justification is clear: these are not main clauses. 

This may be a good move in English syntax to be able to account for 
examples such as the following without further assumption: 


(51) the man who I think and Barry claims owns the house 
S'/S' sys (S'NNP)/NP | NP 
[S/S] Ecg 


S'\ NP 
I write the standard assumptions in square brackets d. if we assume S as a 
result, not $^) and the new ones on top to show that it is not the result but the 
domain type of substrings such as J think that we should worry about because 
either assumption would give us a residue as a function from $' to something. 
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Further support for a subordinate category such as (S’\NP)/NP for the 
subordinate verb, and also for the presence of O in English syntax, comes 
from the following example which works with the standard assumptions for 
everything else: 


(52) the man 


who I think and who Barry claims ` owns | the house 
(NNN)/NNP) SIS ` (NNNJ/SNNP) (SNNP)/NP. NP 
(NNNJ/SNNP) ` SNP 
(NNN)/(SNNP) 
N\N 


German and Turkish show that the degree of freedom here is still within a 
radically lexicalized grammar: distinct word order in subordinate clauses of 
German, in contrast with second-position verbs in main clauses, and distinct 
Turkish subordination morphology, where word order for subordinate clauses 
is the same as main clauses but morphology differs; the subordinate subject 
and the verb must carry overt agreement morphology which is distinct from 
main clause agreement morphology. In other words, an external constraint or 
rule is not necessary. 

The functional categories seem to have in common the semantic prop- 
erty that they operate over PADSs in which the predicate is always opaque, 
as in the type-raising of arguments, value-raising of properties and partici- 
pants, and complementizer semantics. They cannot latch on to a substantive 
meaning directly. Radical lexicalization makes this aspect very explicit due 
to forced syntax-semantics correspondences in a lexicalized grammar. 

The theory of functional categories can be seen as a quest for more re- 
fined restrictions on lexicalized syntactic types, and also as an aid in search of 
good bootstrappers for learning. Brent (1993) shows how far the idea can go 
in computational learning of lexicalized grammars in an unsupervised way, 
with a warning that it needs a narrowly constrained theory of possible gram- 
mars. The closed set of items does the work of self-supervision. There seems 
to be the correlates of these assumptions in the acquisition environment of 
the child. We know that children are late in producing function words, but 
they seem to zoom in on them early in analyzing utterances (Santelmann and 
Jusczyk 1998), and the frequency of function words is consistently higher 
than the frequency of content words, both in child-directed speech and in 
adult speech, across languages (Shi, Marquis and Gauthier 2006). 
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8. Case, agreement and expletives 


Some other special categories that serve functionally without apparent se- 
mantic content shows characteristics similar to that of functional categories. 

Any lexical functor that has an argument (say an NP) in its domain of 
locality can refer to its consistently discernible features, such as case, agree- 
ment, noun class, tone (for tone languages) and locus (for signed nouns). 
There would be no basis for the functor to look at a nondiscernible feature, 
such as whether it modifies a noun that starts with the phoneme /b/, since 
that information cannot be coded in syntactic types. 

A list of typical functors can give us an idea about agreement controllers: 


(53) a. verbs. e.g. SNP, (S\NP)/NP 

. adjectives, e.g. N/N 

. nouns, e.g. N/(N\N), N 

. determiners, e.g. (S/(SNNP))/N 

. relative pronouns, e.g. (N\N)/(S/NP) 
. prepositions, e.g. (N\N)/NP 

g. adverbials, e.g. (S\NP)\(S\NP) 


ka CD D Co CR 


Examples of agreement involving these functions include: subject and verb 
(Portuguese), subject, object and verb (Uralic languages), adjective and noun 
(Russian), noun and noun in possessor constructions (Georgian), determiner 
and noun (German), relative pronoun and noun (Latin), preposition and object 
(Welsh), adverbial and subject (North Caucasian languages). Thus all pos- 
sibilities that are allowed by functor types are attested for argument-taking 
entities, and they cross-cut the accusative-ergative-split classification of lan- 
guages and word orders. 

The radical lexicalization of functional categories ($7) suggests that all 
these patterns are lexicalizable, and the lexical combinatory categories that 
arise out of these considerations clearly distinguish agreeing and nonagreeing 
arguments. Take for example some quirky cases of agreement. The combina- 
tory nature of the domain of locality and type raising of arguments facilitate 
a natural account of what is called “brother-in-law agreement” in Relational 
Grammar (Perlmutter 1983), exemplified below. 


(54) a.There are/*is cows in the garden. Aissen (1990) 
b.There seem to be some bugs in the soup. ` Perlmutter (1983: ex.65) 
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Since this is not triggered by the copula but by the expletive, it follows that 
the category of the expletive must take the raising verb as an argument first, 
and value-raise it, which provides a domain of locality where all the agree- 
ment features, including that of the NP following the copula are available to 
the expletive. 

We can think of raising verbs as forming a typewise discernible class in 
the lexicon. Following Clark (1997), Steedman (2000b), I will consider the 
auxiliaries and the copula as raising verbs (55). 


(55) The class of raising verbs: (Vi) 
are := (S\NP)/(S\NPagr): APAx.be' (Px) 
might := (S\NP)/(S\NP): APAx.might' (Px) 
seem := (S\NP)/(Sto-int\NPagr): APAx.seem' (Px) 


Raising verbs such as seem follow the same pattern in their dependency struc- 
ture. Note however the lexical differences, such as the brother-in-law agree- 
ment for the copula and seem. I will collectively refer to them as V,,. Their 
common pattern in the PADS is the crucial aspect of the generalization. 

A single lexical category for the expletive there is sufficient to handle 
brother-in-law agreement, without the necessity to posit another agreement 
pattern. As this is lexically triggered by the expletive, it would have no rela- 
tion to the object-agreement systems of ergative languages. 

All NPs in the locality of the expletive have the same agreement informa- 
tion in (56a), which yields the right behavior in (56b-c). 


(56) a. There are cows in the garden 
S/((S\NPagr)\((S\NPagr)/NPagr)) Mu gin (SNNPagri) (SNNP) F 
ÁVizagr : APAx.be' (Px) \((S\NPagr)/NPpiu) \(S\NP) 
: APAQ.O(P self’) : A f.f cows 


SA((S\NPopta)\((S\NPptu)/NPpu)) 
: AQ.Q(Axi.be' self! x1) 


(SNP agi N(SWWP agri )/ NPpis) 


: in' garden! (bel self’ cows’) 
b. There is/*are a cow in the garden. 
c. *There is/are. 


In other words, there not only equalizes argumenthood in semantics (see its 
PADS, where the predicate P is reduced on self’), it also equalizes agreement 
in syntax by underspecification (see its agreement features, which are all agr). 
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This stands in contrast with the type raising of all other subjects in English, 
which all carry an agreement constraint, for example S /(S\NP3s) for she. 

Radically lexicalizing the neutralization of agreement also gives us an op- 
portunity to account for the following difference, where there's is another 
lexical item:?! 


(57) a. There's many people here. 
b. *There is many people here. 


The expletives are quite idiosyncratic (if is not a neutralizer; cf. 58a—b). 
Thus lexical value-raising of the brother NP by the expletive is justifiable 
(value-raising is needed to get the right PADS, and the corresponding propo- 
sitional type for Q is required by the Principle of Categorial Type Trans- 
parency). 


(58) a. It is/*are important that we call the cows home. 
b. It seems/*seem to rain. There seems/seem to be a problem. 
c. *There is himself/herself in the garden. 


The predicate-argument structure of there ensures that the brother NP be- 
comes the maximally PADS-commanding argument (without a linking or 
chain theory), which is consistent with the ungrammaticality of (58c), as- 
suming of course a genuine reflexive reading. Because of the universal nature 
of binding, we would expect all languages with brother-in-law agreement ex- 
pletives to follow (58c). 

The argument depends on the assumption that all arguments are type- 
raised in competence grammars. We can see the empirical consequences of 
this in the following example, where a participant (i.e. type-raised) category 
is acceptable but a property is not. 


(59) Who's going to help me do the dishes? 


Well, there is John mon. 
S/((S\NPagr)\((S\NPagr)/NPagr)) (Sve\NPsg)/NP 
L.((Sbe\NPagr)/NPa r) K As An bel xoxi 
: APAO.O(P self") 


SJGNNPS (NP LNB) 
: AQ.Q(Àxi.be' self! x1) 
Notice that the category of the expletive does involve type raising, just like 
other subjects, and the copula agrees with the subject, just like other verbs. 
We are not setting up a separate expletive syntax, or a special nonthematic 
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role for the expletive for the verb to worry about. The expletive’s uniqueness 
is to take a type-raised brother NP category as an argument so that it will have 
lexical access to the domain of locality of that NP. Without this, we could not 
claim to have captured the competent knowledge of the expletive, because the 
examples below could not be handled. Thus the competent knowledge of the 
expletive presumes the knowledge of type raising in the language. 


(60) a. There are cows in the garden and mice in the kitchen. 
b. *There are cows in the garden and a mouse in the kitchen. 


The expletive is the only exception to type raising of subjects, in languages 
with expletives. We can conjecture that expletives are acquired quite late, af- 
ter many syntactic environments have been encountered, giving enough ex- 
posure for type raising of objects to be mastered. 

The point of the expletive's category is that, if we are to account for its 
unique agreement behavior and argument-taking, we cannot simply rely on 
the presence or absence of thematic roles; we must show a PADS that arises 
from syntax like everything else. Its semantics cannot be empty (witness the 
PADS in (56a) which includes a substantive component se/f’), unless we set 
up a special syntax for the expletive. That of course is not the agenda of 
radical lexicalization. 


9. Thesemantics of scrambling 


The radical lexicalization of functional categories ($7, $8) as part of a the- 
ory of feature geometry suggests a clear distinction between agreement and 
nonagreement domains of type-raised arguments. We can expect subject- 
agreement languages to type-raise the subject in ways that enforce agreement. 
Likewise, we can type-raise an accusative NP to an agreement domain if there 
is object agreement in the language, as in Uralic languages. 

Since type raising is order-preserving, and its liberal variety in syntax 
would be devastating because of permutation closure (Moortgat 1988a), we 
will not get free word order just because all arguments are type-raised. These 
results suggest that free word order must be a conspiracy of more than one 
grammatical resource. Steele (1978) clearly shows that it cannot be just case 
marking because some languages with morphological case show no sign of 
scrambling (e.g. Albanian), and some languages without case allow it (e.g. 
Classical Aztec, Garadjari).?? A freely permuting verbal category or some 
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stylistic (exogrammatical) choice cannot be the answer either because so- 
called scrambling languages do impose limits on it, and when it is licensed, 
every different order seems to add some information-structural aspect to the 
PADS.” 

The key point that forces us to keep so-called scrambling in grammar— 
therefore do something about its semantics—is that, although there can be 
multiple factors to induce a permuted sentence, all the resources involved 
relate to grammar: case, morphology, intonation, information structure, an 
attempt at the disambiguation of scope, etc. 

Take for example the following sentence pairs from a so-called scrambling 
language. Example (61a) is ambiguous; there can be more than one car. (61b) 
however is not ambiguous. 


(61) a. Her çocuk araba-ya bin-di. Turkish 
every child car-DAT mount-PAST 


‘All children went in the car.’ 
b. Arabaya her cocuk bindi. 


The unambiguity of (61b) is not forced by word order alone. There are pre- 
suppositions, for example, that all the children were waiting. If some children 
have taken the train, ending the event of train-taking, we are back to an am- 
biguous interpretation. A competence grammar should deliver both readings, 
which are different semantically to begin with, and an oracle must choose 
between them depending on context and intonation.” The oracle is going to 
need some grammatical information to disambiguate, and the delivery of that 
information is the grammar's responsibility. Thus radically lexicalized gram- 
mars must deliver different things about different word orders, otherwise the 
grammar itself must be the oracle. This would fly in the face of radical lexi- 
calism because it amounts to saying that all contextualizations must be lexi- 
calized in the grammar, a result which seems theoretically possible but very 
unlikely. 

A competence grammar of Turkish must also handle the apparent asym- 
metry caused when the same process of word order flexibility is repeated 
postverbally. In the examples below, we are not forced to think of elabo- 
rate alternatives or presuppositions to see that both are ambiguous. Kornfilt 
(2005), Kural (1994, 1997) concur with these observations. 


(62) a. Bindi her cocuk arabaya. Turkish 
b. Bindi arabaya her çocuk. 
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The postverbal process seems language-specific, suggesting a lexicalized 
solution to the syntax-phonology interface, rather than some universal. For 
example, a Russian speaker could say Denis udaril Sashu to mean either ‘De- 
nis hit Sasha’ or ‘It was Sasha that Denis hit’, but a Turkish speaker would 
never use this word order to convey the second reading. 

This facilitates a minimal comparison of alternative grammars to see for 
example the interaction of the semantics of the accusative case and the cat- 
egory of the verb. The Turkish verb must be typed head-final in the lexi- 
con to account for the contrast in (61) and (62), otherwise case marking it- 
self cannot deliver information about head-finalness of surface word order to 
an oracle. The reason is as follows. If we categorize the transitive verbs as 
S{|NPnom |NPacc} to handle all variations on word order, where the set no- 
tation indicates arguments in any order (following Baldridge 2002), a back- 
ward type-raised accusative cannot be assumed to take the role of indicating 
a postverbal order; both orders below would be fine with that type:?? 


(63) a. her cocuk cukulata-yi sev-di 
every child. NOM chocolate-ACC like-PAST 


(S/NPace)/(S/NPacc\NPnom) S\(S/NPacc) S{|NPnom |NPacc) 


S/(S/NPacc\NProm) 
‘All children liked (the) chocolate.’ 
b. sev-di cukulata-yt her çocuk 
S{|NPnom, [NPacc ] S\( S/NPace) S\(S/NProm) 
S/NP waa | 

The verbal category must be revised to fix this. If we assume Turkish is 
head final, i.e. the transitive verb is of type S{\NPnom, \NPacc}, then back- 
ward type raising cannot derive (63). Forward type raising cannot help with 
the asymmetry either because it cannot deliver (63b) in the first place. Now 
we must call in another resource, which we must relate to intonation because 
we have used up other resources. In a radically lexicalized grammar, this must 
arise from a lexical category, which has been identified as the lexical rule for 

rightward contraposition by Ózge and Bozsahin (2010): 


(64) NP — SgN(SgNNPg) (D for background) (T. 


The rule says that all nominals, irrespective of their case, yield a different 
kind of sentence when they are backgrounded, to deliver a rheme- or theme- 
backgrounded clause. The proposal is purely type-dependent, not position or 
structure-dependent, because it simply correlates an exclusively backward- 
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looking category with backgrounding. The f) feature is reflected on the PADS 
objects as a side effect, by marking them background rather than more salient 
or contrastive. Because of the result's directionality in (64), it can only com- 
bine arguments that are postverbal, which indirectly (i.e. grammatically) as- 
sociates postverbalness with backgrounding in Turkish. Thus we have all the 
information to be delivered at the interfaces to communicate the informa- 
tional differences between (63a) and (63b) via their PADS: 


(65) a. her cocuk cukulata-yi sev-di 
S/(S\NPnom) (S\NPnom)/(S\NPnom\NPacc) S[NNPnom, \NPace/ 
S/(S\NProm\NPacc) E 
b. sev-di cukulata-yi her çocuk 
S{\NPnom, \NPace} $8 Sg ND acc) 58 \(Sg\NPB nom) 
Sg \NPnom E 


Now we can clarify the semantics of the accusative case which in some ac- 
counts is assumed to be vacuous. It cannot be directly information-structural 
or about definiteness, because such matters are not always lexicalizable. Wit- 
ness (63a—b), where the accusative NP is not necessarily definite.” It is not 
necessarily a theme or rheme either. Therefore a lexical category for the ac- 
cusative marker must be neutral, i.e. it must be AP.P, which by definition 
makes P predicational. 

What makes P a dependency arising from a transitive verb is its syntac- 
tic type, not its predicate-argument structure. It can be indirectly information 
structural, as in (64), which presupposes that it has a PADS to begin with so 
that an update on that PADS can take place. Since the syntactic type of the 
accusative forcibly faces a AP.P semantics, it has no room for substantive 
side-effects. It can only pass down the informational features, which must be 
put in the PADS and the syntactic type by other items. That is why the ac- 
cusative can only be a projector of informational features, rather than being an 
instigator. See Ozge (2010) for more arguments supporting this conclusion. 

Everything in the lexicon must have a PADS, i.e. semantics, otherwise we 
cannot account for interactions between the lexical categories. That is, we 
cannot create a grammar. 
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10. Searle and semantics 


Assuming that there must be some kind of semantics in the grammar, and that 
the kind of semantics we can put in the grammar must be compositional for 
syntax to do its work, we can question whether this semantics is just another 
name for syntax, or for formal symbols. The issue relates to the longstanding 
argument that syntactic manipulation alone cannot give rise to meaning. 

Searle (1980) in his Chinese Room thought experiment sets out to show 
that a purely formalist account of the mind is not possible. It relates to our 
present concern because he chose language, in particular semantics, to make 
his case. The specific claim he was arguing against is strong AI, the claim that 
a functional interpretation of the mind counts as a mind. This view according 
to Searle is bound to fail in its aspirations because the kind of computation it 
envisages is formal, i.e. it operates over symbols with no content, whereas the 
mind sets up, he claims, relations between intentional states and the world, 
via causal powers of the brain. We must have “the right stuff,’ i.e. a human 
brain, to have that causal power, according to Searle. 

In the same article (and subsequently in 19902), Searle addresses possible 
objections to his claim, which are mainly concerned with what is embodied 
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in the Chinese room. Searle called them “the system's reply", “the robot re- 
ply", “the brain simulator reply", “other minds" and “other mansions” reply, 
and their combination, against which I believe he defends his position quite 
convincingly. 

Recall also some other criticisms, such as Rey's (1986) argument that 
mental states are species-specific for all species anyway— which to me sug- 
gests that ascribing semantics to certain states of a machine ought to be con- 
structed by the machines, and the experience cannot be presided over by an 
external judge. 

Rey's argument I think brings a Husserlian perspective into the debate 
in which we can talk about sharing the subjective experiences of humans 
among themselves but most likely not with cats or ants, which leaves open 
the possibility that they can do the same thing and do not inform the humans 
about it. It amounts to saying that the mental states can be real for all species. 
With a stretch of imagination we can grant the same ability to machines that 
perceive, act and react. However, I will not follow this line of argument. 

Itis interesting that the debate continued between Searle, the philosophers, 
psychologists and AI researchers, with almost no argument from linguistics 
(but cf. Carleton 1984). I offer one in this section from philosophy of linguis- 
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tics, to question whether the Chinese room as imagined by Searle is possi- 
ble. My argument is about what Searle considers computational, and about 
the linguistic conception of the same notion, which must, according to many 
cognitive scientists, indirectly relate to semantics. 

First a summary of Searle's argument, from a more recent self state- 
ment (Searle 2001): imagine a native speaker of English, who has no knowl- 
edge of Chinese, locked in a room full of boxes of Chinese symbols (a 
database) together with a book of instructions written in English (the pro- 
gram), which he can interpret, for manipulating the symbols. More Chinese 
symbols are sent in to the room (questions), which the person in the room cor- 
rectly answers in Chinese symbols by following the instructions for match- 
ing the database symbols and the symbols in questions. The person passes 
the Turing (1950) test in communicating Chinese, that is, a native speaker of 
Chinese at the other end of the box cannot tell that the answers are not com- 
ing from a Chinese speaker. Yet the person in the room does not understand 
a word of Chinese. The program and the database add no understanding of 
Chinese to the person, though he already knows how to interpret symbols in 
one language, namely English. By extension, computers cannot understand 
Chinese (or any human language) by purely formal manipulation of symbols. 

The linguistic aspect of the experiment I think is as follows: what is Chi- 
nese in the Chinese room is the database and the fragments of the program 
that contains Chinese symbols and their abstractions (the program is in En- 
glish, but it is about Chinese symbol correspondences). The program cannot 
be of infinite size (otherwise it would not be a program), therefore the corre- 
spondences in the program cannot be phrase-to-phrase matchings, for we can 
conjecture that there are potentially countably infinitely many Chinese ex- 
pressions. (Or, if we take the infinitude claim to be less critical for language, 
as I have argued to be the case in $3.3, then we can say that the competent 
speaker's knowledge of phrase-to-phrase matchings would be too large to fit 
into any room.) 

Hence the program must contain finitely characterizable symbols and their 
program-internal abstractions, such as calling a group of symbols a certain 
kind of category, and certain combinations of categories to be other cate- 
gories, and so on and so forth, in other words, a grammar of Chinese. It does 
not matter for our current purpose that such a grammar is not necessarily 
lexicalized; its finite representability is the key point. 

In the thought experiment we must assume that the program contains a 
(competence) grammar because we can "suppose also that the programmers 
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get so good at writing the programs that from the external point of view— 
that is, from the point of view of somebody outside the room in which I 
am locked—my answers to the questions are indistinguishable from those of 
native Chinese speakers." (Searle 1980; my emphasis). 

Let us now turn to the boxes of Chinese symbols. They would minimally 
contain Chinese vocabulary, and perhaps more, such as a large inventory of 
expressions based on symbols in the program. This too must be finite to fit 
into the room. We thus have a system of grammar and a lexicon housed in the 
room. 

I claim that the experimental setup is inconsistent because of the forced 
assumption of housing a grammar, and not being able to use it for semantic 
interpretation. All grammars in any linguistic theory are interpretable because 
their product is there solely to provide a full array of phonetic, semantic and 
syntactic interpretation. The theories only differ on how they go about getting 
these interpretations from a surface string, and how to explain them. 

What, then, is the problem with computation in Searle's program? In the 
linguistic sense, the program is not doing computation at all, because compu- 
tation is what links the string (the phonological form) and the meaning (say 
the PADS) at the interfaces to perceptual and conceptual systems of cogni- 
tion. The link is the critical assumption, and needs further refinement. 

In the Minimalist Program of Chomsky (1995), computation is conceived 
as the operation that links the stages of deriving a surface string, where inter- 
mediate results as syntactic objects are kept for later use. It seems to me that 
Searle's choice of natural language computation for his thought experiment 
is inspired by an interpretation of Chomsky in an early incarnation. 

Chomsky nowadays maintains that the interpretation of the string be- 
gins after its features are delivered to spell-out, at which point its access to 
lexicon—hence to meaning—is cut off, and the string is ready to be pro- 
nounced. More specifically, Searle seems to have in mind what Brody (1995) 
later called radical minimalism, where the phonological form is just an inter- 
pretation of a single interface, and the semantic interpretation rules and the 
lexicon have access only to that interface. This seems to be a more literal 
implementation of having a single hole in the box for outside access. This 
might appear to suggest that what takes place is essentially formal symbol 
manipulation of the morphological or phonological kind. 

This is not entirely correct. Interpretable features are always carried within 
the intermediate records of syntactic objects. This was true in the pre-spell- 
out period of Chomsky as well, under different guises. I am in no position to 
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defend the Chomskyan view of carrying the semantics along, but clearly we 
do need room for these features in a faithful thought experiment of syntactic 
rule manipulation. 

Moreover, we have seen that syntax and semantics can be derived in lock- 
step so that they are available at any time. For this to work in the Chinese 
room, semantics must be allowed to enter the room as well rather than ex- 
pected to rise in it. Radical lexicalization shows that these meanings will 
arise only from the meanings of the words in the string because there are no 
intermediaries, and the semantics of common dependencies is invariant. 

Therefore the Chinese room as a whole must have access to strings and 
meanings outside the room to be able to hypothesize about internalized mean- 
ings. Marconi (1997: 137) raises a similar objection: “a meaningless linguis- 
tic symbol cannot be made meaningful by being connected, in any way what- 
soever, to other uninterpreted symbols.” It appears then that Searle is argu- 
ing from one conception of language computation, which is not universally 
shared (and might be considered dubious by its practitioners), to show that 
syntax suffices to legitimize his picture of the Chinese room. 

What takes place in the room is not computation in the computing science 
sense either, for that computing is a link too, to link the programs (the form) 
with the executable code (the meaning), at the interfaces of the machine to 
the programmer's expressions and intended tasks, the latter of which cannot 
be determined by the computational system. 

We are reminded of Searle's (1990b) claim that running the wordstar pro- 
gram might as well be undertaken by the wall behind him, since the wall is 
complex enough to embody the formal structure of wordstar. This is a gross 
oversimplification of computation. Programs execute only when they are in- 
terpreted by the "right stuff", which is in their case a virtual machine instruc- 
tion set. If the wall has the right stuff, then surely it can execute wordstar, but 
then it would be a brick-implemented computer rather than just a wall. Rey's 
(1986) warning that strong AI is not behavioralist but functionalist makes the 
same point. I am not defending strong AI against Searle, but it need be said 
that he faces the same oversimplification of rule-following in computation as 
he does in syntax. 

An uninterpretable program has no semantics—it is not a program, 
whereas a program that does nothing has one, with perhaps free interpre- 
tation in the programmer's world. Thus Searle's criticism of formal symbol 
manipulation as the basis of understanding may be directed towards possi- 
ble reductionism of some programmers doing nothing but syntax, or for not 
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showing anything interesting in the way of semantics in current practice, but 
it is not an intrinsic problem of computation. 

One might argue that semantics as conceived above is not really semantics 
because it is not situated in the external world, but this is precisely the point 
in linguistics and computing: language-internal semantics is only a gateway 
to the conceptual system, then to the world, where meaning cannot be deter- 
mined by language. Language provides a semantic representation over which 
external (anchored) meanings can be enumerated. That is, understanding is an 
interface problem of connecting internal and external meanings for all kinds 
of species, natural or artificial. 

Melnyk’s (1996) objection to Searle follows a similar line of thought for 
programs. Marconi’s (1997) point about inferential and referential knowledge 
of words as lexical competence, independent of whether the doer of symbol 
manipulation is natural or artificial, carries the same message: “The genuine 
problem is not whether knowledge of meaning can be “reduced” to symbol 
manipulation but what kind of symbol-manipulating abilities would count as 
knowledge of meaning or understanding of language” Marconi (1997: 137). 

If this is the case, then a computational system can in principle be made 
to face the same conditions as the child for understanding the connections 
between sounds and meanings, once we readjust our semantic radar and in- 
corporate compositional meanings into the notion of category. 

There is already some progress in the way of breaking the “semantic di- 
vide” of a child’s acquisition of language and a computational learning of 
human language. Zettlemoyer and Collins (2005) experiment with statistical 
learning of grammars (§5) in which the training data (for the machine) are 
sound-meaning pairs, and in which syntax is a hidden variable. This is a sys- 
tem which takes as a start the assumption that there is no external access to 
the internal states of a program such as Searle’s. They use a limited category 
space in lieu of a universal grammar to control the search-space problem for 
the hypotheses the system generates, and we can assume that the substantive 
and formal principles can do the same task for the child in the manner de- 
scribed in §5. Therefore, the input to the room must be sound-meaning pairs 
in order for computation to take place inside the room, and syntax—more 
specifically parsing—is what happens inside. 

Led this way, the system learns a fully interpretable grammar, of course 
with errors and approximations, but with the possibility of correcting them 
by exposure to further data. The crucial computationalist assumptions in their 
algorithm are that shorter, contiguous and less ambiguous strings are enter- 
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tained first, because the system must look at the powerset of alternatives to 
guarantee that the correct hypothesis is always among the candidates. With- 
out these assumptions, we cannot assume that once hypothesis selection is 
down to a single candidate or very few candidates, we are done. 

The results are too preliminary to be conclusive, but they point out prin- 
cipled directions for discerning the methodological and intrinsic problems of 
computing. I conclude that (a) Searle's Chinese Room is linguistically in- 
adequate, and (b) it can be made consistent with bona fide computation, in 
which case the unduly pessimistic belief that a computational system cannot 
be made to face the same conditions for understanding as humans is not war- 
ranted. The key point is having access to semantics as an independent channel 
of intake and output, as assumed in the inverted- Y diagram of Figure 6. 

In this setting, assuming an opaque computation by the invariants of CCG 
helps us narrow down the remedy when learning goes awry: the semantics 
of the invariants have no substantive constraint on their PADS. For example, 
composing love’ and hurt’ to get Blove'hurt' can only go wrong if we have 
the wrong assumptions about love’ or hurt.’ The semantics of B is invariant. 

The opaqueness of the invariants and the transparency of the substantive 
assumptions (experiential knowledge) further reveal the nature of computa- 
tion in CCG. It is a monad, where these processes are threaded rather than 
performed independently. 


Chapter 10 
Monadic computation by CCG 


The possible landscape of substantive categories can be significantly reduced 
by considering the codetermination of syntax and semantics under a single 
fundamental assumption, adjacency. But it might seem excessive that CCG 
makes use of so many invariants as its combinatory base to do that (see Ta- 
ble 2 for a long list). The reason I have suggested is that factoring the com- 
binations as such makes the grammatical process completely syntactic type- 
driven and transparent to the sources of types, to morphology, phonology 
and lexical semantics. Nothing needs to be remembered during a type-driven 
derivation. This seems to be a prerequisite to work towards understanding 
parsing as a reflex. 

Nevertheless, one would expect in a purely applicative system that ap- 
plication as its primitive would stand out against all others. Recent analyses 
indeed suggest computationally distinguishing dependency and application 
in CCG. No constraint has been found necessary so far on the syntacticized 
combinators B and S, in controlling the projection of features of radically 
lexicalized types. (More accurately, all the earlier constraints on combina- 
tory rules have been replaced by constraints on lexicalized syntactic types.) 
Combinatory dependencies always project all features. 

Some constraints seem inescapable for application. It will follow that 
combinatory dependencies can be opaque processes whereas application 
must be transparent so that we can apply the constraints. These findings reveal 
the monadic aspects of CCG, suggesting that CCG's one-step computation is 
a two-stage process, as in monads. 

Monads are quirky mathematical objects. They are in fact ubiquitous in 
everyday computing. For example, the famous Unix "pipe" (invented by Dou- 
glas Mcllroy in the 60s) is represented as ‘|’, and it threads a sequence of 
computations by chaining their input and output, which is now called the I/O 
monad. If n processes agree to take input and produce output in a standard 
way (called streams), we can chain them as pil pal --: | ps. 

It is tempting to think of parsing as one long seamless pipe where ev- 
ery individual stage p; is some parsing action (i.e. rule use, equivalently, for 
CCG, type use). However, this would imply that any intermediate process 
is opaque looking from the outside world. This is most likely not true, for 
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example we have catches of breath (or rest in signing), intonational phras- 
ing, restarts, interjections, turn taking and giving (either voluntary or in- 
voluntary), etc. Some stages seem to be available for "repiping", i.e. we 


have pil: Ipjllpjzis Mb Ll pel: | Pnr, rather than pil pkl +++ | py, 
where ‘||’ represents a joint in the pipe at which some properties must be 
transparent. 


As the preceding preliminary discussion implies, I believe CCG as a the- 
ory has something to say about these “| | joints" where access is needed, and 
it has to do with the interaction of the seamless lexical projection of types 
onto surface phrases and satisfaction of constraints.?/ The applicative struc- 
ture and dependencies seem to vary systematically in this regard. 


1. Application 


The asymmetry of simply projecting all the features in a combinatory rule 
and sometimes having to stop the projection in application is forced by the 
data. The following four instances among others corroborate the asymmetry 
of feature projection. 


1.1. Reflexives 


Consider again the simple control of grammar-lexicon division in CCG using 
the feature LEX, for "lexical". Steedman and Baldridge (2011) argue that rad- 
ically lexicalizing the reflexives forces a feature such as LEX. The category 
of the reflexive must look for a lexical verb: 


(1) Mary hurt herself 
(S\NP)/NP (SNNP)Nux((SNNP)/NP) 
S\NP ` 


CCG’s derivations are entirely syntactic type-driven therefore the syntactic 
type of herself must bear this feature as +LEX, as above, which we could 
also write as ‘\g as before. We need this constraint to avoid reflexive in- 
terpretations of herself and himself in the example John showed Mary her- 
self/himself. They are forced to a different analysis because, unlike true re- 
flexives, they must take focal accent (Steedman, p.c.). Therefore application 
is subject to the following constraint: 
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(2) XLA Y Ya, cM Xa, 
Ya, Kaff? Y a> Xa, 
where A; are variables for the value of LEX. 
A2 = A; if Aj is specified, Ag = —LEX otherwise. 


It would be projection if A» = A, necessarily. This seems to be the case for 
B and S. No special treatment has been reported for them in the literature. 
The earlier constraints on combinatory rules, for example those in Steed- 
man (1985), have been replaced by the lexical control of slash modalities 
since Baldridge (2002). The only remaining constraint which has not yet been 
reformulated via modalities is Trechsel's (2000:630) stipulation on forward 
composition for Tzotzil, which is readily translatable to lexical restrictions as 
SNP for the unaccusative verbs and S (NP for the unergative verbs. 


1.2. Supervised learning 


The second example of projection asymmetry arises from a similar special 
treatment of application, for the purpose of learning the CCG categories from 
annotated data. Hockenmaier, Bierner and Baldridge (2004) report the fol- 
lowing from the Penn Treebank: 


(NP-SBJ (NP The woman)) 
(SBAR (WHNP-1 who) 
(S (NP-SBJ John) 
(VP (VBZ loves) 
(NP (-NONE- *T*-1))) 
(ADVP deeply)))) 


They explain: “If a *T* trace is found and appears in complement position 
(as determined by the label of its maximal projection), a 'slash category' is 
passed up to the maximal projection of the sentence in which the trace occurs 
(here the S-node), hence signaling an incomplete constituent" Hockenmaier, 
Bierner and Baldridge (2004: 176). The passing of the slash category is shown 
in bold in the tree below, which is their CCG approximation for the same data. 
Its projection stops when the head daughter (e.g. who above) applies. 
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NP:NP 
NP:NP SBAR:NP\NP 
etn OT 
the woman 
WHNP: MEE aes 
Ee 
ae p S 
NP:NP VP:SNNP 


| fw:NP 
John pou 
VBZ:(S\NP)/NP | NP:NP 
| 


loves *T*-1 
The implicit assumption is that the substring (John loves *T*) is derived 
by CCG's combinatory rules, viz. B here. 
(4) John loves 
NP (SNNP)/NP 
2T 
S/(SNNP) 
S/NP 
In this range this feature always projects, and the head closes off the projec- 
tion by application, say with the category (NPNNP)/(S/NP) for who. 
No special treatment of projection has been reported for combinator-like 
dependencies in wide-coverage parsing models, where large quantities of 
similarly annotated data are available for training; see e.g. Hockenmaier and 


Steedman (2007) for English, and Çakıcı (2008), Eryiğit, Nivre and Oflazer 
(2008) for Turkish. 


1.3. Gapping and syntactic abstraction 


The third example of a special treatment for application comes from informa- 
tion structure and focus projection. Steedman (2000b) has proposed a rule of 
decomposition for verb-medial gapping, which in effect does the triple duty 
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of function reabstraction, theme narrowing and the revealing of nonlexical 
categories during syntactic derivation: 


(5) X, : left! > S,/$;: Oleft' X, NS,/$j): Ay.left’ (««) 


The rule is a special case of backward abstraction, X — Y X\Y. As be- 
fore, intermediate phrases (1) combine with likewise intermediate phrases to 
establish the information structure, following Pierrehumbert and Hirschberg 
(1990). 0 is for theme-marking, and p for rheme-marking. The rule accounts 
for verb-medial gapping (6), and avoids anti-gapping; see Steedman (2000b) 
for details. 


(6) Dexter eats bread, 


(S/NP)/NP | SNS/NP/NP) 
and Warren, potatoes 
«T «T 


(S/NP)\(S/NP/NP) SNS/NP) 


SNS/NP/NP) 
Steedman (2000b: 190-1) 

No B-abstraction or S-abstraction has been reported for any language. 
This is not surprising because the dependencies which are projected by com- 
bination rules are functions of lexical specification, whereas reabstraction 
cannot "see" lexical specifications to be sensitive to them; note the lambda 
term Ay.left’ in (5). 

It is important to observe that the asymmetry of projection is between the 
classes of rules (dependency versus application), not instances. Decompo- 
sition seems relevant for both kinds of application. The forward variety of 
it must be assumed to maintain a grammatical solution to focus projection 
(Ozge and Bozsahin 2010): 


(7) X,: right’ + X,/(SjN$j): Ay.right’ S,N$;: O;right (>>) 


Interpreted in the context of the question in (8a), the narrowing of focus pro- 
jection in (8b) is achieved by (>>), within the intermediate phrase (arabayı 
kullaniyor). Note the forced appearance of 0 feature in the rightward-revealed 
category, for theme. There would be no focus (rheme) narrowing if the context 
were What does your mother do? 'This aspect cannot be controlled lexically 
because it is contextual, hence its capture by reabstraction is expected. 
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(8) a. Anne-n ne-yi kullan-tyor? 
mother-POSS.2s what-ACC use-IMPF 
“What does your mother drive?’ 


theme—kontrast 


aN 
b. (Annem) H- 
my mother 
>T 
S@/(S@\NPo,nom) S$1\S$0 
rheme theme—background 
A—— gemengen 
(ARABAYI kullanıyor) L- 
car-ACC F drive-PROG 
> 
Sp /(Sp \NPp,acc) S{\NPnom, 5$1\S$p 
WPacc] 
>By 
Sp \NPp,nom 
< 
Er ND nom 


(St\NPtnom)/(Si\ NP nom) Eu ND) nom : 9yright’ 
“My mother drives the car" Ozge and Bozsahin (2010) 


Thus the asymmetry between application and the combinatory rules such 
as B and S is maintained, and there is no reabstraction asymmetry between 
backward and forward application with respect to focus projection. 

Before closing this section, I must note that forward abstraction is 
new evidence for syntactic abstraction. Backward abstraction might seem 
construction-specific, as it serves to only mediate gapping in verb-medial lan- 
guages. It has its limits on constituency, for example the following example 
cannot have a rightward constituent to reveal, because Steedman's (2000b) 
analysis crucially depends on both subject and the object to be backward 
type-raised (i.e. English is considered virtually VSO and lexically SVO to 
avoid anti-gapping and other bad effects), which cannot obtain a constituent 
in this example, as shown for the string but i can you. 


(9) Yippee, you can't see me, but I can you. Syd Barrett? 


but I can you 
(S/NP)\(S/NP/NP) (S\NP)/(S\NP) S\(S/NP) 


However, both Steedman’s backward reveal rule and the forward one above 
do something crucial to syntactic types: they engender a mechanism for fo- 
cus projection and its narrowing; see Ozge and Bozsahin (2010) for some 
discussion. Not surprisingly, such processes are not sensitive to lexical ma- 
terial (but they are certainly dependent on syntactic properties such as being 
an argument versus predicate), hence their reverse process leading to abstrac- 
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tion in syntax seems justifiable. The essence of the rule seems necessary for 
incorporating that aspect of the syntactic process. 


1.4. Morpholexical constraints 


The fourth and final example of an exception to feature projection in ap- 
plication is reported for external sandhi, including Welsh soft mutation and 
English wanna-contraction. I have in mind Steedman’s (2009) proposal to 
handle them in a unique way. Both processes require that we stop the sandhi, 
which is a finite-state rule in Steedman’s formulation, under forward applica- 
tion, and always project this feature in combinatory rules. 

The sandhi rule also seems relevant to backward application. In Turkish 
noun incorporation, where a morphologically unmarked preposed argument 
is incorporated into the adjacent verb, external sandhi is instigated by the 
SOV verb, i.e. by backward application: the incorporated noun is syllabified 
with the verb as external sandhi (10a—c), if phonological constraints on sylla- 
bles are not violated as in (10d).?? Brackets in these examples denote syllabic 
segments. (10e) shows that the morpholexical rule has limited applicability. 


(10) a. kitap okumak b. taraf oldu Turkish 
..[ta][po].. ..| ra ] fol ].. 
book read-INF side be-PAST 
"book reading’ ‘be part of’ 
c. tugla presledi d. taraf tuttu 
..[lap [res ].. ..[raf][tut].. (no sandhi) 
brick pressed side took 
*brick-pressed' ‘took side’ 


e. tuğlayı presledi 

Je ol. *[yrp ] [res ]... 

brick-ACC pressed (no incorporation; no sandhi) 
“pressed the brick’ 

In summary, there is something special about application in feature pro- 
jection, whereas no special care is needed for combinatory dependencies. All 
manifestations of application, the forward and the backward varieties, seem 
prone to this asymmetry. 

Application is also special theoretically because it cannot be conceived 
as a combinator itself for syntax or morphology. Although we can assume 
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ASA fAx.fx, i.e. application as a lambda term without free variables, the 
juxtaposition fx is now unaccounted for if we do not take application as a 
primitive and employ A in its stead. This would leave no room to syntacticize 
or morphologize A; we would need a primitive of concatenation or affixation. 


2. Dependency 


In contrast to application, dependency computation seems to be an opaque 
process. We can first have a look at some of the limited degrees of freedom 
afforded by this result, then exploit it to provide an efficient model of CCG's 
computation. 

The invariants in Table 2 combine by application or combination. We can 
also think of them as unary type correspondences, which reveals their de- 
pendency encoding. I now write the semantics of correspondences as well to 
discuss the dependencies. Recall that, in CCG, dependencies manifest them- 
selves in the predicate-argument structure, and syntacticization faithfully re- 
flects them on syntactic types, as shown in Chapter 4. 

I rewrite a fragment of Table 2 as a running example for this chapter. The 
first version is obtained by carrying the category of the nonhead term to the 
right of the arrow, which leaves the semantic head f as the input to deriving 
the right-hand side: 


(11) Semantics-driven encoding of dependencies: 
EE — X/Y: Àx.fx (>) 
X\ Y: f > XW: Ax.fx (<) 
XLY: f > (X[Z)/(VLZ): AgAx.f (gx) (>B) 
DT > GNZNON): Agax.f (gx) (<B) 
(XLYMZ: f > (XLZ)/(VLZ): Aghx.fx(gx) >S) 
AYAZ > AZAZ: AgAxfx(ex) <S) 


Here I follow the standard practice in computational linguistics, which dates 
back to Partee and Rooth (1983), that categories enter the lexical assignments 
at their “lowest type”. In (11), they are the left-hand sides of the arrow. For 
example, a := (X/Z)/(Y/Z) is a higher-type homonym of o := X/Y in Partee 
and Rooth's sense. A right arrow for a unary correspondence must be inter- 
preted in the current chapter with this assumption in mind. 

The correspondences in (11) follow from adjacency because, if we have 
the configuration A B = C, we can also get it with A > C/B and B > C\A. 
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The dependencies in (11) arise from a semantic (head-driven) strategy of 
translating the CCG rules in Table 2 to the homonyms of their head function. 
For example, in (« B), the head function f is phonologically after g because 
Y\.Z: g EK f — XNZ. We kept f on the left-hand side of the correspon- 
dences in (11), which is the head of the dependency in B fg. 

In contrast, the homonyms of the lower types in (12) are motivated by 
phonology because it is always the phonologically first category of a combi- 
natory rule that is mapped to a homonym, which is guaranteed by adjacency. 
It is clear from these correspondences that the first two rows of the phonolog- 
ical strategy (12) and the semantic strategy (11) add nothing informative to 
the set of categories which are at the parser’s disposal, with the exception of 
the phonological translation of (<) in (12), which looks suspiciously like for- 
ward type raising (more on this later).!° The last four rows of each strategy 
are informative syntactically. 


(12) Phonology-driven encoding of dependencies: 
X/Y: f — X/Y: Àx.fx (>) 
Y:a > X/(X\ YY): Af.fa (<) 
Arr > (XDZ): AgAx.f (gx) (>B) 
Y\Z: g > (X\Z)/(X\¥): A f Ax.f(gx) (<B) 
(XLYMZ: f > (XLZ)/(YLZ): Agax.fx(gx) CS) 


YuZ:g a (X\Z)/((X\¥)\Z): AfAx.fx(gx) (<S) 

Current thinking in linguistics is that phonological cues tune in earlier than 
semantics, and they are predictive. This view favors a model of CCG parsing 
which uses (12) for dependencies, rather than the binary rules of Table 2, or 
the semantically-motivated (1 15.191 

The learning of a syntactic category is the crucial part of acquiring a gram- 
mar, which is a hidden variable problem (Zettlemoyer and Collins 2005), 
where the input is a pairing of a phonological form and an assumption about 
its meaning (the PADS), and the syntactic type is the hidden variable. (12) 
suggests that the learner in the parse-to-learn paradigm first has a grip on the 
type homonyms of the prefixes of a string to associate right or wrong mean- 
ings with parts of it. Thus the string must be available as a structured domain 
for analysis, so that we can hypothesize about the syntactic types from the 
beginning to the end in a sequential fashion. 

All CCG learners operate in the parse-to-learn paradigm, for example 
Villavicencio (2002), Bos et al. (2004), Zettlemoyer and Collins (2005), Çöl- 
tekin and Bozsahin (2007), Steedman and Hockenmaier (2007), Clark and 
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Curran (2007). We have seen the basic idea at work in §9.5. They can be made 
to work with (12) to implement the phonological-cues-first idea, provided 
that we can thread application and combinatory dependencies to achieve the 
pipeline effect of (12), which we will do in the next two sections. Presumably 
such parsing models will be easier to train on phonological cues. 

The last four rows of (11) and (12) are all nonredundant. The list in Table 6 
shows that nonredundancy holds even in the presence of crossing modalities 
and powers. The list is a phonological encoding of Table 2. 

The phonological encoding may be the most natural monadic computa- 
tion by CCG because through it all dependencies begin to look forward in 
the string. Notice the main slashes of the higher-type homonyms in Table 6, 
which are in the right-hand sides of the arrow. This encoding’s relation to 
adjacency is evident (recall that we have no phonologically null type assign- 
ments), and this takes us to sequencers in combinatory theory. 

Whether we keep the dependencies separate or phonologically or semanti- 
cally encode them depends on what use we put them to. In all cases, a freely- 
operating T is absent, and this also directly relates to sequencing. This is one 
aspect which stands out in a monadic interpretation of CCG. 


3. Sequencers 


The implication of the results so far is that dependency projection in a pars- 
ing configuration can be an opaque process. Although application requires its 
ingredients to be visible so that idiosyncratic constraints can be imposed on 
it, no such visibility seems required for dependencies. In other words, depen- 
dencies can be shunted into a sequencer, whereas application cannot. 

The combinatory equivalent of a sequencer is Curry and Feys’s (1958) 
composite product, defined as X -Y = BXY. As they proved, it is equivalent 
to a sequencing of X and Y, where X -Y first performs X, then Y, on a sequence 
of arguments. For sequencing to work X must be a regular combinator, i.e. 
it must not change the order of its first argument. Therefore, T cannot be a 


sequencer, because T E Axdy.yx. 

Its import for monadic CCG is that T cannot be part of a monad that con- 
tains Table 6. Take TB fgh, which reduces to f B gh, and there is no possibil- 
ity of reaching a normal form. This would abruptly stop the monad after its 
first step. The relevance of this result to efficient computation will be evident 
in the next section, but first let us look at what more is at stake with T. 
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Table 6. Phonology-driven encoding of monadic dependencies (cf. Table 2). 


XLY > LZ) >B 
Y\Z > (NZ) <B 
X[Y + (X\Z)/(Y\,Z) >B, 
ES > (X[Z)/(XN Y) <B, 
XLY > ((X|Z)W)/(YLZ|W) >B? 
(YNZIW > ((NZ)IW)/GNSY) <B? 
X/Y — ((X\,Z)|W)/(Y\,.Z)|W) >B? 
(YLZ|W => ((XLZ)W)/(XN Y) «Bl 
(XLYLZ > (XLI >S 
D — (XNZ/(XNYNZ) «S 
(X[YNZ > (XV2Z/(V,2Z) >S, 
Y/Z > GOEN, KI) «Sx 
(ÄIS > ((XLW)Z/(YLW)Z >S" 
(IS > ((X\W)|Z)/((X\¥)|Z) <S” 
(X/Y)IZ => (OM. WISEN, WU) >S% 
(YLW)IZ => (XLW)Z/(XNY)Z <S% 
XL(Y|Z) > (XLOV|ZD/(YLW) 20 
Y\,W — (XNQO|ZD)/XN(Y|Z2) <O 
X/(Y|Z) | —.  (XVQOW|Z)/(N.W) 20. 
Y/W > (XL(W|Z)/XN(YIZ) |. < Ox 


A set-theoretic formulation makes it easier to see that T is a monoid itself 
in a system of binary combination. Recall Lambek’s (1988) definition of type- 
raised categories, for example S/(S\N) as N, where he also noted (NS P = 
N.S Given three functions f,g and hin space N5 we have fo(goh) &e(fo 
g)oh, where o is binary composition. The identity element of the monoid 
is Af Ax.xf for f € NS The meaning of this result for our present concern 
is that the monoid adds further bracketings of the same string, therefore it 
is not part of the core system, and it must be controlled lexically to stop the 
overgeneration of surface constituency. 

The idiosyncrasy of the syntacticized T (type raising) and the argument- 
hood requirement on its application (it applies to argument types to yield 
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functions from functions that require such kinds of arguments) also suggest 
that type raising must continue to operate in a lexically constrained manner. 
The alternative, which is to incorporate Tas X/(X\Y) in Table 6, leads to 
nonmild-context-freeness, as Hoffman (1993) showed. Similarly, the back- 
ward variety X\(X/Y) must be avoided. Thus T’s problems with sequencing 
are confounded by its uncontrollable power of producing categories indefi- 
nitely without advancing the computation, if let to operate freely. 

Although lazy evaluation can be called in to force a freely operat- 
ing Tto terminate in a parsing configuration, the problem with Tis not 
only effective computability or disruption of sequencing. An implication of 
Curry and Feys’s (1958) results is that Tis not dependency-encoding but 
dependency-preserving: T = Cl. Thus T fa = Clfa = laf = af. The combi- 
nator C encodes a dependency between the head x and its arguments y and z 
in Cxyz, but this is neutralized by | in Cl fa. There is no lexical resource in I 
to encode a dependency, which by definition needs a head. 

It is therefore not surprising that T must be part of the lexicalized gram- 
mar, either built into the lexical categories or operated as a lexical (unary) 
rule, and it must not be in the set of common dependencies that feed into 
application. And without T's disruption of sequencing, other combinators 
manifest monadic computation with respect to application. 


4. The CCG monad 


A monad in Category Theory is a triple (M,1], Hl where M is the type con- 
structor, 7] is the function to inject values into the monad by monadic type 
construction, and u is the function that threads the computation within the 
monad. All monads can be characterized as below, where we need to fix 1] 
and u to get a particular computation (’—’ here represents a function's type). 


(13) n: k—^ Mk 

L:Mk—(k— Mp) Mp 
In our case, u threads dependency and application. The computation engen- 
dered by CCG is known as the reader monad or the parser monad in pro- 
gramming languages (see Hutton 2007), which we can define as follows: 
(14) Let CCG-M=(M, n, u), such that 

Nc = Àx.(c,x) : MK, for c,x: K, 

u = S. Thus ua (dU) = Sa(dU) = 
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Ax.ax(dUx): Mk > (K + Mp) > Mp, 
where d is the dependency function, U is the set of dependencies in 
Table 6, and a is application. 


Monadic computation has been known in computing since Wadler (1990) 
and Moggi (1991). The monadic nature of function application was pointed 
out earlier by Shan (2001), who was also among the first to point out the 
relevance of monads to natural language computation. Later, Barker and 
Pryor (2010) have shown that Jacobson’s variable-free semantics constitutes 
a reader monad as well (her g and z; see $6.3). 

The inner workings of the CCG monad in (14) can be described as follows. 
The MK types will be pierced into kK types in the monad to do its computa- 
tion; note the function type of d, viz. x + Mp. The output of the monad is 
of type Mp, which is the sequence containing a singleton result category be- 
cause of the uniqueness of the dependencies in Table 6. The abstraction e 
injects an ordered pair of categories (of type Mx) into the monad.! The re- 
sult of the process d is a monadic homonym of the left component of x in U, 
depending on the right component of x. In simpler terms, if a left-hand side in 
Table 6 matches a left component of the input in the monad, and if the right 
component in the input matches the domain type of the homonym of the left 
component, then the homonym and the right component is returned as a pair. 
Failure is reported as x. 

The result of the monad is the result of process a, which can be forward 
application of CCG, backward application, a binary use of a common depen- 
dency in U (Table 6), or failure, reported as L. Function a applies the result 
of (d Ux) to the input in x. Thus the knowledge of common dependencies is 
kept in the monad as its internal affairs. 

Some examples can clarify the mechanism. Assume that we inject into 
the monad the sequence (SNP, S.S). The evaluation of the process (d Ux) 
yields ((SNNP)/(SN,S), SN,S) by (< B). Function a becomes the forward ap- 
plication of (SNNP)/(SN,S) to SN,S. If the input were (NP, SNNP), the process 
(d Ux) would return x because no common dependency manifests a leftward 
nonfunctor type such as NP. The process (axx) then becomes backward ap- 
plication, where x is the pair (NP, SNNP). 

Consider now the input (S/NP, NP). Process (dUx) returns x because 
no dependency in U can match NP as the domain type, and (axx) becomes 
forward application. The monad would report failure (L) for the ordered pair 
(N, (N\N)/NP) because neither d nor a succeeds. 
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To avoid confusion of monadic CCG derivations with standard CCG 
derivations which decorate rules on the right-hand side of a line of deriva- 
tion, I index the monadic derivations on the left-hand side of a line, and write 
the monadic combinator that led to the successful application of (14): 

(15) a. Wittgenstein loathed and Kafka adored mentors. 
S/(S\NP) (SNNP)/NP 
>B 


S/NP 
b. Articles which I will file without reading 


VP/NP (VP\VP)/Cing Cing/NP 
7 (VPNVP)/NP 
VP/NP 


Example (15a)’s inner workings can be fleshed out as a two-stage process of 
one-step computation by CCG-M, as in (16). 


<Sx 


(16) Wittgenstein loathed 
S/(S\NP) (SNNP)/NP 
Gäng! ) 
S/NP 


It first applies (> B) of Table 6 to S/(S\NP) to get (S/NP)/((SNNP)/NP) by 
process d, then forward-applies it to the category of loathed to yield S/NP 
by process a. This is binary B as a two-stage process, shown in dotted lines 
above. 

There is no unary application of the combinatory dependencies by the 
monad; the unary dependencies of process d must always thread through pro- 
cess a. If allowed unary application would be a dangerous practice because 
we know that unary B must not operate freely in syntax. I repeat the relevant 
examples, where (17b) attempts unary B: 


(17) a. I think that Wittgenstein might have liked Kafka 


VP/S' S'/Stin Sfin 
b. *I think Wittgenstein that might have liked Kafka 
NP S'/Sén San NP 
1B 
(S'\NP) /(Stin\NP) 


S'\NP 
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The only monadic dependency that can force (14) to combine ‘that’ with 
‘might have liked Kafka’ is (> Bx) below (repeated from Table 6): 


(18) a. X/Y > (XA D/Z (» B.) 
b. that might have liked Kafka 


S (ën Stin\NP 
>B, 
S'\ NP 


However, this combination requires the lexical assumption S’/ Sg, for the 
complementizer, which is empirically inadequate: we cannot derive the fol- 
lowing fragment. 


(19) the field I think that Kafka liked 
AS (ënn  Stin/NP 


We can radically lexicalize the contrast in (18) and (19) in the category of that 
without further assumption, which must be S’ Lëeo, as standardly assumed in 
CCG: 


(20) that Kafka liked 
ST ën  Stin/NP 
^ SNP 
Likewise, a freely operating unary Sis dangerous. Consider the Welsh 
examples again, from Awbery (1976: 39). Although the category (S/S')/NP 
is sound for the complement-taking verbs (21a), the word order instigated by 


a unary S from this category is ungrammatical (21b). Welsh is strictly VSO, 
and the verb must avoid unary S. 


(21) a. Dymunai Wyn i Jor ddarllen ` llfyr. Welsh 
Wanted Wyn for Ifor reading (a) book 
(S/S')/NP NP S! 


“Wyn wanted Ifor to read a book.’ 
b. *Dymunai ddarllen llfyr Ifor 


(S/S')/NP S'/NP | NP 
> SW 
(S/NP)/(S'/NP) 
Thus we can assume dependencies to operate freely within the monad, 
where they only serve as an input to binary juxtaposition. The same can 


be said about combinators, which no longer decide rule choice and simply 
project dependency encoding. 
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Combinatory modalities encoded in slashes continue to do nonredundant 
work in the syntacticization of monadic dependencies. For example, given the 
sequence (S\ NP, S\ ,S), process (d Ux) can only produce ((S\ ,WP)/(S\,S), 
SN,S), not ((S\ NP) /(S\ S), SN.S); cf. the definition of (< B) in Table 6. The 
process a of the monad fails because the application (S\,NP)/(S\S) S\,S 
fails. Thus the following expected behavior is respected: 


(22) * player that shoots | and he misses Baldridge (2002) 
(NN.N)IGINP) SNP (S\,S)/S S 
— WS 
* SN NP 


Likewise, the configuration (SV,S, S.S) fails to make use of the dependency 
encoded by (< B) because no left member of Table 6 can match S\ „S. There- 
fore the relevant monadic relation among the slash modalities is that the input 
to the dependency must be a supertype, as before. 


5. Radicallexicalization revisited 


No combinatory dependency in Table 6 relies on or introduces a star modal- 
ity. This move makes all the slash modalities truly lexical because we no 
longer need to write the sole combinatory rule of monadic combination, (14), 
with modalities. Modalities only encode the differential lexical syntacticiza- 
tions of semantic dependencies, for example harmonic versus disharmonic 
composition, or no composition. 

It is not surprising that the star modality never appears in the repertoire 
of common dependencies in Table 6: it does not encode a dependency at all 
because it cannot involve a syntactic combinator. This is explicit in a monadic 
CCG. 

The parsing configurations for imposing the special restrictions on appli- 
cation, some of which are listed in $1, are uniquely identifiable in a monadic 
CCG as (axx). This is the condition in which all dependencies fail, and juxta- 
position is the only possibility left for combining. This is an important source 
of information for the oracle, because it needs a limited window of parsing 
contexts, in addition to individual word statistics and some transitional prob- 
abilities, to be able to decipher the relevance of special constraints. We can 
refine this configuration as (a (X /Y,Y)*) and (a(Y,X\Y)x), to distinguish 
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the unique conditions in which forward and backward application are pos- 
sible. These slashes do not need modalities (we can say they bear the least 
restrictive type ‘-’) because application is already implicated by x. There- 
fore, there is only one primitive of combination, viz. the function u in the 
monad (14). 

The cryptic notation of monadic CCG is for a good cause. It manifests the 
same dependency as the ternary, binary and unary equivalents in the standard 
notation, but embodies (i) phonological precedence, (ii) semantic headness 
and (iii) the single slash of combination (the always-forward-looking main 
slash without modalities), all in the left-edge of a combination. A comparison 
of the alternatives for backward crossing composition below show that this is 
indeed the case. 


Q3) Y/Z:g FT Zo — X: f(ga) (ternary) 
Y/Z:g X Y: f — X/Z:Ax.f(gx) (binary) 
Y/Z:g > (X/Z)/(X\,¥) (unary) 

: Afàx. flex) 
(X/Z)/(XNY) XV VY: f — X/Z:Ax.f(gx) X (monadic) 
: Af Ax.f (gx) 


For example, the directionality of all functions and arguments are preserved. 
Compare the ternary version with the monadic one. Z's directionality is for- 
ward, Y's directionality is backward, and X as the result is not associated 
with a directionality. All of these are maintained in the monadic version. The 
head functor, f, is anticipated in the monadic variety from the phonologically 
earlier string. 

We can assume for the benefit of the computationalist treatments of lan- 
guage acquisition that all three aspects (i-iii) above are conveniently located 
in the first category, and that the monadic version can be compiled out from 
the standardly assumed binary version ID 

All syntactic dependencies are forward-looking in Table 6, as one would 
expect from a phonology-driven base for a strict competence grammar. All of 
them arise from the semantics of combinators noted on the far right, and all 
of them are forward juxtapositions, as expected from the syntactic correlate 
of the phonological attachment. 

Finally, note that in the CCG monad of (14) the dependency computation 
by process d terminates in O(k) time through a naive search algorithm, where 
kis the size of the set (a constant) in Table 6, if the set does not contain Y, K or 
I. No data have been forthcoming for a syntacticization of these combinators. 
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6. Monadic results and CCG 


As is evident from Table 6, the monadic grammar described in this chapter 
is functionally equivalent to CCG. It avoids unary use of combinators, and it 
is forced to keep T out of the monad, which is similar to CCG's substantive 
constraints on type raising. So what good is a monadic CCG? The monadic 
perspective imports several results to CCG, as follows. 

The asymmetry of application and dependency in feature projection sug- 
gests potentially different treatments of these aspects. The dependencies must 
always project (they are the opaque part of the monad) whereas application 
can "close off" a projection for a specific feature. These are different kinds 
of parsing actions. This is explicit in a monad. The distinction seems crucial 
for interfacing parsers with other components of language processing, for ex- 
ample with inference systems, learning systems or with systems of discourse 
and pragmatics. 

Application in CCG can be reduced to a single primitive of parsing action, 
viz. juxtaposition, as originally intended by Schónfinkel (1920) for combi- 
nators almost a century ago. This potential simplification, at least in theory, 
shows that CCG adds no auxiliary assumptions to engendering constituency, 
dependency and structure from order alone. 

The binary syntax of CCG follows not only from empirical concerns over 
unary and ternary B and S, but also from juxtaposition itself. The reason is 
as follows. If CCG is indeed monadic, then dependency projection must be 
internal to the monad, therefore the dependencies that are shunted into the 
monad cannot differ in arity. Nor can they combine by themselves if their 
output must feed into a primitive. The only noncombining unary-operating 
combinator is T, which does not instigate a dependency, and which is itself a 
monoid with severe limitations. We have good mathematical, computational 
and linguistic reasons to leave it out of the syntactic combiners. The last two 
aspects have been known for quite some time (e.g. Steedman 1987, 2000a, 
Komagata 1997, Hoffman 1993, Eisner 1996). The monadic perspective re- 
lates the two aspects to the mathematics of sequencing. 

We would not expect to see in natural language syntax the kinds of depen- 
dencies Smullyan (1985) attributed to his admittedly odd combinator birds: 
Finch, Owl, Queer Bird, Quirky Bird, Robin, Turing Bird and Vireo. Like 
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Thrush (T), they are irregular combinators hence not sequencers. (A combi- 
nator is regular if it does not change the order of its first argument.) Although 
we can conceive a monadic organization of T (which is a monoid) and the 
set of common dependencies in Table 6 (which is a monad), in the form of a 
layered monad (Filinski 1999), which is to say that T can prepare input to the 
monad in (14), or apply unarily to its result, there is no indication that other 
spoilers add up to a monad with that layered monad. Considering the fact that 
T is part of the minimal apparatus BTS which captures the unorthodox but 
fully interpretable constituencies, these kind of dependencies do not seem to 
be relevant to natural language computation. 

Hoyt and Baldridge (2008) derive the binary rules of CCG from unary 
B, Sand T. Monadic CCG suggests that going the other way, i.e. deriving 
the unary dependencies from combinatory rules to factorize dependency and 
application is revealing too, theoretically and empirically. For one thing, we 
must assume the “derivational oracle" of the same derived meanings to be the 
speech data themselves, i.e. tones, tunes, stress and pitch accents. The reasons 
are as follows. 

Hoyt and Baldridge introduce on the formal side inert slashes to do normal 
form parsing, and to derive the CCG rules from unary combinators. They also 
need a structural postulate in lieu of a switch to do normal-form versus left- 
branching parsing. We would like to be able to assume that the data contains 
the right source for disambiguation. 

This is a forced assumption in a monadic CCG. Consider an inert-slash 
formulation of a homonym of ( B), taken from Hoyt and Baldridge: 

Q4) XLY: f 2 (XI! Z/(YL! Z): AgAx.f(gx) 

Monadic dependencies are opaque inputs to function application, thus we 
would be forced by this rule to introduce the antecedent-government seman- 
tics of ‘l’, equivalently the ‘-LEX’ restriction on the slash following Steed- 
man’s (1996b) semantics for it adopted by Hoyt and Baldridge. Now we have 
to introduce ‘!’ to all (> B) homonyms of X/Y. The rule above would force 
the monadic homonym to always require Z to be nonlexical, hence this con- 
straint would not necessarily arise from a lexical category as one would ex- 
pect. 

An inert slash is fine as an option in the lexicon, but its introduction by 
a monadic homonym is problematic. Using the inert slash to avoid spurious 
derivations must then be reconsidered in light of the richness in the data. 
Take | |John likes] Mary] and | John | likes Mary] |. Normal-form parsers (for 
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example that described in Eisner 1996) would eliminate the first alternative 
unless it is in a substring such as the lady I believe John likes, but these 
derivations are not spurious until we know the context and intonation in which 
the utterance took place: 


(25) a. Who does John like? 
b. Why does John avoid Mary? 
a'. [[John likes] Mary] 
b’. [John | likes Mar! 


Both bracketings in (25a’—b’) are possible analyses as an answer to the first 
question. The first analysis is spurious for the second question. We can as- 
sume, following Steedman (200023), that the intermediate phrase boundary 
tones in an answer to the second question would not allow the bracketing 
[John likes| anyway. They have their own syntactic type, and in a modalized 
CCG, these types could not be composed over; they are S$\ A3. 

It is mainly the text data, i.e. information loss, that should make us wary 
of spurious derivations. Then we are left with spurious derivations within 
an intermediate phrase to worry about, which are related to focus projection 
and quantifier scope as well (Prevost 1995, Komagata 1999, Steedman 1999). 
Therefore it might be preferable to filter them out only after they are engen- 
dered, as done by Vijay-Shanker and Weir (1990), rather than avoid them 
syntactically as Eisner (1996) does. We can be aware of these consequences 
when we derive the unary dependencies from binary ones, as in the monad. 

Monadic grammar might also suggest a principled way to answer the fol- 
lowing question: why is the only kind of syntactic abstraction related to the 
primitive (juxtaposition)? Why can't we have B-abstraction, say in the re- 
verse direction of ( B) in the universal syntax of CCG? 

Nothing can undo or redo a derivation to the extent of reconsidering the 
projection of lexically specified semantic dependencies. This is what com- 
binators do on the PADS side of the words in the lexicon, which is directly 
reflected on their syntactic type. This follows not from a principle or a stip- 
ulation, but from the inherent asymmetry of sequencing the processes of de- 
pendency projection and application. The asymmetry is explicit in a monadic 
grammar. 

Finally, let us reconsider the computational problem of language acquisi- 
tion ($9.5) from a monadic perspective. 

CCG has concentrated so far on head-driven approaches to the acqui- 
sition of categories (e.g. Niv 1994, Villavicencio 2002, Zettlemoyer and 
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Collins 2005, Cóltekin and Bozsahin 2007). A left-dependent monadic gram- 
mar forces us to follow a phonological line exemplified in Table 6. Seman- 
tics helps to narrow down the hypothesis space, thus the proposed sketch 
assumes—following Steedman and Hockenmaier (2007) and Chomsky—that 
the root of the problem is grammatical bootstrapping rather than syntactic or 
semantic bootstrapping. 

The idea of the sketch coincides with a remark by Chomsky, quoted as 
personal communication by Hornstein (1995): 


The basic point seems to me simple. If a child hears English, they [sic] pick 
up the phonetics pretty quickly (in fact, it now turns out that many subtle dis- 
tinctions are being made, in language specific ways, as early as six months). 
The perceptual apparatus just tunes in. But if you observe what people are 
doing with language, it is subject to so many interpretations that you get only 
vague cues about LF. 


This is entirely consistent with computationalist language acquisition out- 
lined in $9.5. Recall that in that way of thinking the so-called universal gram- 
mar, in present terms the invariants and the constitutive principles for cate- 
gories, sets up the prior probabilities of day one. The child's confidence in a 
category for strings, a posterior probability, is updated by a learning scenario 
where possible derivations set the stage for the likelihood, fostered by priors 
updated by experience. 

Monadic grammar's contribution to this process is making phonology the 
driving force, where the left edge is not necessarily the semantic head as 
can be seen in Table 6. In any belief update on categories, we first see the 
left edge of a derivation, which is temporally the first part of a combinatory 
context. But the whole process cannot depend on phonology, because a left- 
dependent has many potential results the resolution of which depends on the 
right-dependent (i.e. expectation), and on the child's beliefs about the cate- 
gories, that is, on her lexicon. 

The potential result can be ambiguous only if the child's assumptions 
about the strings are ambiguous, because monadic application is always func- 
tional, given two adjacent types. Therefore any source of ambiguity must 
emanate from lexical types, or forced on the system from the outside, to be 
handled by an oracle. The process in between is an algorithm, which I wrote 
as the monad CCG-M. 


Chapter 11 
Conclusion 


We started with Schónfinkel, then moved to Chomsky, Curry, Lambek, 
Geach, Bach, Montague and Gazdar, and reached Steedman, Szabolcsi and 
Jacobson as the progenitors of rule-to-rule semantics in applicative syntax, be 
it natural or formal. We ended with Schónfinkel through monadic grammar, 
where there is only one rule of syntactic combination. We dealt mostly with 
combinatory matters and only occasionally with set-theoretic ones, which de- 
serve a book of their own. 

Schónfinkel's ingenious method of variable elimination reveals adjacency 
as the sole basis of semantics, which, by virtue of Steedman's syntacticiza- 
tion, is also the sole basis of syntax. Ades and Steedman's (1982) Adjacency 
Corollary is an independent discovery of syntactic interpretability by juxtapo- 
sition alone, where they provide the first syntacticization of B. Geach (1972) 
is perhaps the first mention of combinators in syntax, where he follows Quine 
rather than Schónfinkel in variable elimination. I will come back to this point 
shortly. 

It seems clear that Steedman's program is not the elimination of vari- 
ables but keeping adjacency as the only base for syntax, which exports di- 
rect and immediate interpretability to constituents. His choice of LF as a 
level of representation attempts to resolve some unsettling issues of imme- 
diate interpretability, namely that of pronouns and scope variation, precisely 
because their semantics do not seem to arise according to Steedman from se- 
mantics of order alone. The variable-free semantics of Jacobson appears to 
have a different agenda, where adjacency in surface structure can be compro- 
mised, such as by a potential consideration of wrap for the benefit of (almost) 
interpretation-ready semantics, given some model-stage storage for binding 
and scope. Here adjacency is a cherished assumption but not a must. The key 
issues in the debate appear to be the impact of type-shifting rules on com- 
putational efficiency, the predictive force of positing or not positing an LF, 
and the unsettled nature of intermediate scope readings and others that fall 
between the cracks in scope-taking. 

Combinators are not alone in persisting that only order leads to structure in 
cognitive science. Elman’s (1990) simple recurrent networks take the notion 
of time out of input representations, and predict its structure from sequential 


206 Conclusion 


representations. The lesson we learned from such kind of connectionism is 
not only that symbols need some representational support, but that the inher- 
ent asymmetry of sequential representations can change the way we look at 
cognitive problems. The same can be said about combinators. 

Schónfinkel's desire to find the foundations of mathematical logic has be- 
come a linguistic theory in CCG in which the only primitive is his Schón- 
finkelization of argument-taking objects by which not only arguments but 
functions can be passed on as values. Curry's similar aspirations have given 
us functional programming par excellence. 

Curry's brief foray into linguistics, Curry (1961), suggested another way 
of handling natural language syntax-semantics where a combinatory calculus 
drives the logical aspects (he had called it the tectogrammatical level). A sep- 
arate syntactic calculus (his phenogrammatical level) works on the surface 
structure engendered by words and phrases.! This line of research culmi- 
nated in what is known as Applicative Grammar (Shaumyan 1977, 1987) and 
Convergent Grammar (Pollard 20082). 

It is interesting that Chomsky (1961) was in the minority in the famous 
symposium that was held in New York in April 14—15 1960, which also in- 
cluded papers on mathematics and language by Curry, Halle, Harary, Hock- 
ett, Jakobson, Lambek, Mandelbrot, Putnam, Quine and Yngve, among oth- 
ers. He suggested contra Curry and many others in the meeting—Lambek 
excluded—a unification of grammatical description, rather than having sev- 
eral syntaxes. I believe he was right to insist on this approach, although his 
own theories took a winding road in the matter. Remember the kernel sen- 
tences versus derived ones, the optional versus obligatory transformations, 
cyclicity and rule ordering, move-everything versus move and merge, deep 
and surface structures versus interfaces. The recent convergence of radical 
lexicalism and minimalism suggests that we have now less degrees of free- 
dom to hypothesize about possible categories, therefore about possible gram- 
mars, and we can import each other's results. 

The amount of semantics we expect to squeeze out from the syntactic 
categories is crucial in this debate. If we keep our radar too narrow, as in 
Chomskyan transformationalism, it seems that we need to make an array of 
auxiliary assumptions. If we open wide, we would have no choice but do 
syntax with semantic types, and not even Montague went as far as that. The 
middle ground seems to be the rule-to-rule-hypothesis of Bach, which was 
implicit in Chomsky's early writings, which, once radically lexicalized, puts 
a natural limit to what kind of semantic types can be put in correspondence 
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with the syntactic ones. It seems that adjacency can serve both ends without 
extra assumptions. 

It is also worth noting that, if the next simplification in transformation- 
alism is the elimination of move, as some practitioners of the theory have 
already proposed (Epstein et al. 1998), then what we will get is essentially 
some version of categorial grammar, modulo morphology. (Distributed Mor- 
phology seems to fill that gap nicely, although its computational properties are 
understudied at the moment. Autosegmental morphology of McCarthy 1981 
is better-known in this respect; see e.g. Bird and Ellison 1994, Kaplan and 
Kay 1994.) Epstein et al. indeed acknowledge that on the semantic side the 
states of affairs would look very much like a Montagovian categorial gram- 
mar (Epstein et al. 1998: 13), but with some effort to bring in the syntactic in- 
structions about compositional semantics as a residue of derivations, namely 
the cyclic delivery of partial results. The point of CCG is that semantics is 
available at any time. 

Four independent developments, namely Chomsky's formalized notion 
of grammar, Lambek's inauguration of radical lexicalism, Schónfinkel and 
Curry's conception of semantics in order, and Steedman and Szabolcsi's syn- 
tacticization of the same transformed Chomsky's 'rule of grammar' to 'cate- 
gory of a word’, and ‘knowledge of language in grammar’ to ‘knowledge of 
words’. 

Radical lexicalism as first demonstrated in the 1960 conference by Lam- 
bek grew out of the unification of the grammatical description, and its pre- 
dictive powers for possible linguistic categories far outweighed the simplicity 
and elegance arguments of the multi-level syntactic approaches with “purer” 
strata. It is largely a theoretical debate which is not supported computation- 
ally. 

In turn, computationalism as manifested in the combinatory knowledge 
of words puts some flesh in Wittgenstein’s theory of meaning-is-use, by re- 
flecting a personal history of word usage, both personally and per word, its 
potential misunderstandings, but no misrepresentation of it. Fallible knowl- 
edge is genuine knowledge explicitly represented in a category. The result 
that it must incorporate some detailed statistical knowledge in tandem with 
combinatory knowledge should not be surprising to anyone who has followed 
the research in language acquisition, machine learning and computational lin- 
guistics. 

We can now go back and study Quine’s appraisal of Schónfinkel's work. I 
repeat Quine's commentary cited in the introduction: 
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It was letting functions admit functions generally as arguments that Schón- 
finkel was able to transcend the bounds of the algebra of classes and re- 
lations and so to account completely for quantifiers and their variables, as 
could not be done within that algebra. The same expedient carried him, we 
see, far beyond the bounds of quantification theory in turn: all set theory was 
his province. His C,S,U and application are a marvel of compact power. But 
a consequence is that the analysis of the variable, so important a result of 
Schónfinkel's construction, remains all bound up with the perplexities of set 
theory. Quine (1967: 357) 


His own solution to variable elimination, Quine (1966), needed a meta-theory 
to avoid the problems he had pointed out, whereas Schónfinkel's theory was 
an object-level theory, which led to direct syntacticizability without levels 
or strata. His understandable concerns for set theory are not imported into 
this syntacticization, because this is combinatory syntax, not set-theoretic. 
Semantic objects are not sets but predicate-argument structures embodying 
semantic dependencies, which are structural domains in need of a primitive 
for construction. 

By direct import from the elimination of variables at object language, con- 
stituents are built by syntacticization of the same primitive. This might help 
us see the sister theories of CCG such as Construction Grammar (Goldberg 
1995, Croft 2001) and Dependency Grammar (Hays 1964, Hudson 1984, 
Mel'éuk 1988, Kuhlmann and Nivre 2006) as wanting an explanation why 
we have the constructions we observe in languages, and why we see only 
certain kinds of dependencies and constituencies. I have exemplified quite 
a number of the last kind, ranging from traditional constituents such as VP, 
NP etc., but also the unorthodox strings that seem to have immediately inter- 
pretable subpieces thanks to combinators, such as / say three mathematicians 
in ten and you claim four philosophers in five prefer corduroy, or I can, and 
perhaps you will, try to sing ‘Flaming.’ 

The combinatory process has its limits because it cannot make a compo- 
sitionally uninterpretable fragment a constituent. It cannot call a fragment a 
constituent and not immediately deliver a compositional meaning for it. / can 
you sing is a word salad although some parts of it are not, and / can you 
is parasitically interpretable in a gapping environment such as Barrett's You 
can't see me, but I can you. 

Whether that makes it a constituent is hard to tell from a combinatory 
perspective, because the point of combinatory semantics arising from order 
alone is that each constituent has the stuff to deliver whatever (partial) mean- 
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ing is available. No such doubts arise about the bracketed substring in (three 
mathematicians in) ten; it 1s a nonconstituent. 

There are impossible words too, such as those with Y semantics, and sus- 
pect words, for example with K semantics. Some dependencies are more un- 
likely than suspect, given the other assumptions of a lexicalized grammar. 
For example, it is hard to conceive how John expects that Barry could mean 
‘John expects Barry to expect’. For it to mean that we need S semantics where 
expect -like verbs can be the targets of parasitic extraction, in pseudo-English 
something like expect; from me that I imagined to _; without wishing to — 
Noun extraction is common, but verb extraction, especially of this kind, is 
unattested. The theory aspires to be explanatory by being as specific as it can 
about impossible constituents, and showing explicitly how the possible ones 
can be constructed. Unlikely ones are a conspiracy of the types in a radically 
lexicalized grammar. In a way, the grammar as a whole symbolizes making 
sense of the world of words in their possible combination. 

CCG's neo-Humean answer to the natural limit on constituency and de- 
pendency is that all syntactic behavior arises from the self-limiting nature of 
codetermination of syntax and semantics in a radically lexicalized grammar 
which faces limited combinatory possibilities. That is all adjacency can of- 
fer with less than a handful of noninter-definable dependency encoders and 
a fully lexicalized grammar.!?? Furthermore, we get the immediate assembly 
of dependency structures for free by the process of syntacticization, and that 
should be a good thing. 

The emerging BSO family epitomizes composition because it is of the 
form Ax.f --(g --x--) in binary. The members of the family represent action 
orientation (the predicates are known), and object opaqueness (the argument 
is abstracted over). They are also known as sequencers. The other family, 
T, represents action opaqueness and object orientation because it is of the 
form A P.Pobject'. It is not a sequencer, but it is a facilitator of sequencing, as 
the monadic perspective showed. Steedman (2002) relates the first family to 
action planning, and the second to affordances. 

Taken together with the other ingredients of human cognition, most impor- 
tantly, awareness of other minds and their affordances, they provide a simple 
ground for semantic recursion and its syntactic reflex without entangling our- 
selves in the debate about the necessity of syntactic recursion (recall the lack 
of the YKI family among the potential candidates for syntacticized dependen- 
cies, which means that, at least in theory, syntactic recursion is not necessary 
to capture semantic recursion). Therefore, it seems possible that language and 
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other cognitive activity in primates can be related evolutionarily if seriation 
is the key. 

I will close the book by projecting back in time about adjacency. A 
speculation-wary reader might consider this point to be the end of the book. 
I will be drawing on some proposals and add a bit of speculation of my own 
about whether this alternative foundation for grammar—order and its seman- 
tics giving us limited constituency and dependency in syntax, has something 
to add to the studies on language evolution. 

Perhaps, but with a caveat, and with some hesitation. First we must re- 
member that Darwin had called his book The origin of species, not The origin 
of life. The diversification and evolution of languages once we have acquired 
the hereditary capacity for language with big L appears to be a different mat- 
ter than how this seemingly unique capacity came about in the first place; 
see for example Knight, Studdert- Kennedy and Hurford (2000) for extensive 
discussion, and the ensuing debate. I will concentrate on the emergence of 
language with big L. 

Take Chomsky's views on the topic, which suggest no intermediate forms 
of language, Bickerton's (1990) saltational view, Jackendoff and Pinker's 
(2005) Baldwinian adaptationist view, and Deacon's (1997) Baldwinian view 
without a universal grammar. The first three arise from the syntactic structure- 
dependence of syntax, and Deacon's view seems congenial to the emergence 
of type-dependence as manifested in all categorial grammars because the 
word does most of the work.!° Recall that knowledge of words is not a sim- 
ple competence of lexical look-up in the present discussion; it is combinatory 
knowledge, that is, a piece of syntax. 

Chomsky's view is not surprising because the phrase structure tree with 
possibly empty elements in it seems to be such a unique source (not even 
the transformationalist lexicon 1s constructed from the same source), we can 
hardly expect to see some precursors or progeny in other cognitive activities. 

Recall also that recursion is everybody's assumption in semantics, and 
syntactic recursion is something we can live without. It is unhelpful to take 
syntactic recursion as an empirical fact and build a theory of language on it, 
including its evolution (see Hauser, Chomsky and Fitch 2002). Genuine syn- 
tactic recursion is depicted in (1a) alongside semantic recursion (1b) to show 
the difference. (1a) is a direct syntacticization of Y semantics whereas (1b) is 
semantic recursion as a tree. Note also that (1c) is not the same as (1a); one 
is an anaphoric dependency and the other is a recursive dependency. It seems 
safe to say that no language has demonstrated a dependency of type (1a). 
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Bickerton’s (1990, 1996) protolanguage might appear to be similar to 
adjacency, perhaps to the applicative fragment without combinatory depen- 
dencies, but that fragment also gives us context-free dependencies as Bar- 
Hillel, Gaifman and Shamir (1960) proved. Maybe that is what it was, maybe 
not. It seems to go over and above what Bickerton intended as protolan- 
guage, because we have reasons to believe that context-free dependencies 
go a very long way in capturing most of the dependencies we see in today’s 
languages; GPSG was one bold attempt at this task (see Gazdar et al. 1985). 
The argument-taking fragment sketched in the beginning of the book does not 
seem to be the niche for protolanguage either because it arises from the same 
base as combinators, which makes it unlikely that the emergence of language 
as a combinatory faculty is saltational as Bickerton suggested." 

Although Chomsky, Bickerton and Pinker differ in many ways about the 
origins of language, they share the same assumption that universal grammar, 
for them being a language-specific set of instructions about syntax, grows 
into an adult-state grammar from an initial state. The knowledge in the uni- 
versal grammar must include—as of 2009: the syntactic principles, merge, 
move, check, select, numerate, empty category governance, functional cat- 
egories and their management, syntactic structure-dependence, and several 
parameters, either abstract or cognitively realized—the latter variety is en- 
dorsed explicitly in popular writing (Baker 2001, Yang 2006). We should 
assume that it comes with some allotment for bilingualism and trilingualism, 
along with some precautions for potential conflicts among parameter values 
or in their order of valuation—trecall that there are arguments for a universal 
order of parameter setting by Baker. 

The computationalist alternative to parameter setting is the exponential 
decay of probabilities as experience is accumulated, not over a long period, 
but within the confines of a few related experiences, which might give the 
appearance of a sudden switch setting, as Steedman and Hockenmaier (2007) 
argued. Some proposed sequencing of parameter valuation, such as the pri- 
macy of head-directionality in Baker (2001), has a head start in a radically 
lexicalized grammar, but not as an on-off switch. It is encoded in every single 
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linguistic hypothesis about syntactic knowledge of words. (Or we can turn 
the table around and say that combinatory theory predicts head-directionality 
to be the primary parameter in a theory of parameters; the lack of clear trends 
in the setting order of other parameters in Baker's repertoire seems to suggest 
that they are more about lexical organization, hence about lexical syntactic 
types, e.g. the ergativity parameter and the serial verb parameter.) 

This argument for an alternative view begs the following question: How 
can we assume every single hypothesis to carry directionality when itis much 
more convenient to set it for all of them at once? We can calculate a child's po- 
tential of making sense of the world if she thinks half the verbs she hears are 
SVO and the other VSO in a language like English. Insisting on her VSO hy- 
potheses would put her at exponentially increasing risks of gawking at moth- 
erese. In the English case, VSO is a clear loser and might show a parameter 
effect. The survivors happen to have the same head-directionality, without a 
parameter. For Turkish, this parameter faces problems. OVS, SVO and VSO 
put together are nearly as common as SOV in child language (Slobin and 
Bever 1982, Aksu-Koc and Slobin 1985). (Precise numbers are 53%, 37%, 
and 10%, for SOV+OSV, SVO+OVS, and VSO+VOS, respectively.) The age 
range for this performance is (2;3-3;8). Ekmekçi (1986) reports that, at (1:10), 
OV and VO are produced by the child. When children were asked to imitate 
motherese word order, they were successful 72% (SOV), 60% (OVS), 46% 
(SVO), 43% (OSV) of the time, at mean age (3;3) (Batman-Ratyosyan and 
Stromswold 1999). We would expect other parameters to be subsequently 
effected by this very flexible parameter value, because of the presumed pri- 
macy of head-directionality. The problem of charting the precise timing of 
parameter settings would be replaced in computationalist models by the task 
of understanding the complex interactions of linguistic hypotheses, assuming 
a somewhat uniform motherese topics. Directionality will be there from day 
one. 

The computationalist perspective is considered to be a resurgent em- 
piricism of the Humean kind (but not necessarily the Lockean kind—see 
Machery 2006 for some cogent warnings and extended discussion), in which 
Hume’s associationism is not taken as the inner cause, but as the source 
of toolboxes in a computational mechanism (of resemblance, contiguity and 
causation), such as in acquisition and inference. (My personal attempt at these 
tasks was Bozsahin and Findler 1992, where we relied on, as in the works 
of others cited there and in the models developed later, the Humean con- 
straints on the hypothesis space.) Combinators too can be naturalized tool- 
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boxes. Call them spandrels if you like, but crucially, they will be of Dennett’s 
(1995) kind, not Gould and Lewontin’s (1979), because they are not neces- 
sary mechanisms, just good solutions to a variety of interrelated problems 
about sequencing. 

Combinatory grammar and its radical lexicalization suggest limited in- 
variant combinatorics in lieu of universal grammar. This seems to require a 
symbolic base (and seriation) which the language must tap on, and perhaps 
only that. Deacon (1988, 1997) has shown a way how indexical here-and-now 
knowledge can give rise to internal self-reorganization to lead to symbol sys- 
tems. Turing (or discrete) representability seems necessary for that, as argued 
in the book. 

Steedman (2002) suggests the involvement of BT in planned and coordi- 
nated activity in close cousins of ours, crucially without an LF, suggesting 
that LF and the syntactic specialization of the combinatory faculty—the syn- 
tactic type—might be the source of language. (As I noted earlier, there are 
disagreements about LF.) 

If language is a specialization of an earlier combinatory trait (and syntactic 
types are indeed different than visual, auditory or procedural combinatory 
categories), then we can expect adjacency to play the key role in this. That of 
course does not imply that there is evolution for grammars, perhaps not even 
for language. The selection pressure might be for better symbol processing, 
and more of it. 

It seems pointless to expect further exploits of seriation by nature, in the 
form of syntacticizing the combinators we have so far not seen in natural 
language syntax. The combinatory path for language, if true, would have had 
to have been opportunistically selected for a long time, two million years or 
more. 

In this regard, the combinatory view allows us to reassess certain claims 
about exaptationism and creative use of language, the latter understood to be 
a product of infinitude. There seems to be no forceful argument to treat them 
as facts. Exaptationism as an effect (but not as a cause) is already built in to 
Darwin’s theory, as opportunistic selection. 

We can put in context the proposals about whether language is a case of 
exaptation or opportunistic selection. Take for example Yang’s (2006) title, 
The infinite gift. As we have seen, infinitude or finitude does not make lin- 
guistic theorizing more (or less) exciting, for we will need a theory even if 
language is vast but finite. Whatever the size and bounds, that theory must 
be about discrete units—words—if the current reasoning in the book shows 
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promise. And for discretely representable linguistic knowledge, giving se- 
mantics to order, and order alone, to lead to structures in language seems to 
be a scientifically more conservative start. Likewise, given Darwinian adapta- 
tionism and opportunistic selection for combinatory traits, rather than Gould- 
style exaptationism, it seems that we would earn the language with big L over 
a long time, rather than take it as an exapted gift. 


Appendices 


A: Lambda calculus 


This appendix briefly reviews lambda calculus. It is not a general or comprehensive 
introduction to the topic. The material covered relates to the main body of the text 
and they are used in it frequently. 

Lambda terms (equivalently, A-terms) are well-formed lambda expressions. 
They are recursively defined as follows. 


À -words are constructed from the alphabet 


x,y,z, +- for variables, 

1,2,a' H for invariables (constants), 
À for abstractor (lambda binding), 

(, ) for grouping (parentheses). 


A-terms are the set A such that 


variables and invariables are in A, 
if M € A then (Ax.M) € A where x is an arbitrary variable, 
if M,N € Athen (MN) € A. 


By convention, we write multiple lambda bindings with a single dot: Ax.Ay.xy is 
written as Ax y.xy. 

Also by convention, lambda bindings associate to the right, and juxtaposition 
associates to the left. AxA yAz.xyz is same as Ax(Ay(Az((xy)z))). 

A variable is free if it is not in the scope of its lambda binding, bound otherwise. 
For example, x is free in x +2, Ay.x and (Ax(a'b'))x. It is bound in Ax.x and in 
AxAy.xy. Within the inner body of the last lambda term, xy, both variables are said 
to have free occurrences because there is no lambda binding in the body. 


Lambda conversions are operations that denote equivalences among lambda 
terms. When used in the direction of eliminating a lambda binding, they are called 
reductions. If a lambda is introduced they are called abstractions. 

The conversions rely on the property of substitution for bound variables. Eta 
conversion shows the behavioral equivalence of the typed objects with and without 
variables. Beta conversion is the main mechanism to establish function application 
and function abstraction as two sides of the same coin. Alpha conversion shows the 
equivalence of bound variables under substitution. Together they define equivalence 
in the function treatment of lambda calculus. 


substitution M [a/x| stands for substituting a for free occurrences of x in M. 


n-conversion Àx. fx =, f, if x is not free in f. 
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B-conversion Ax.M(a)—gM [a/x] 


a-conversion Ax.M =q Ay.(M[y/x]), if scopes of variables in Ax.M and 
Ay.(M [y/x]) are the same. 


equivalence M =N iff Ma =a gn Na, for all lambda terms M,N,a. 


Read ‘=’ as ‘behaves the same’, not as ‘identical’. From substitution and beta 
reduction, we get Ax. fx(a) =g fx[a/x], which is the same as fa, hence the associ- 
ation of beta with function application and abstraction. By equivalence, Ax.fx = f 
too, hence the same behavior when f is supplied with a. The condition on eta con- 
version ensures that we do not change the behavior of objects; Ax. (A y.yx)x, in which 
x is free in Ay.yx, is not equivalent to A y.yx. Similarly, the condition on alpha con- 
version avoids an accidental capture of the same names, for example Ax.y Æa Ày.y, 
and AxAy.xy Zo AyAy.yy. 


Normalization refers to the successive application of a conversion until it no 
longer applies. For example, the beta normalization of (AxAy.f'yx)(a )(b^) is two 
applications of beta reduction giving P bal. The eta normalization of AxAy.f^yx is 
TT. Some lambda terms have no normal forms because the process may not always 
terminate: (A.x.xx)(Ax.xx) has no beta normal form. 

Normal-order evaluation of a lambda term is the application of beta reduction 
to the leftmost outermost reducible expression (redex) first. In (Ax.x) ((Ay.y)a^) there 
are two redexes, and normal-order chooses to reduce it to (Ay.y)a’, i.e. the application 
of the second one, without evaluation, to the leftmost redex. The Church-Rosser 
theorem establishes the result that two distinct sequences of reductions from the 
same lambda term will yield the same normal form if there is one. For the example 
above, it is a.' 


B: Combinators 


This appendix covers some mathematical aspects of combinators. Much of the book 
is about turning combinators into linguistic devices for explanation. These aspects 
are covered in the main body of the text. 

Combinators are lambda terms with no free variables. As such they epitomize 
the compositional behavior of functional objects without a need for variables. By a 
convention going back to Curry and Feys (1958) they are written as single letters 
in bold. No extra notation is needed to describe their behavior. The ones considered 
most basic are defined below. The names were given by Curry and Feys. 


B Z A fAgAx.f(gx) (compositor) 

S I 4 reis, fx(gx) (substitutor) 

C 4 fA gAx.fxg (elementary permutator) 
T A fAg.gf (commutator) 

W 7 1 fax. fx (duplicator) 


KA fAg.f (cancellator) 


For example, Ax.a'xx is equivalent to Ax.Wa'x, which is eta-normalizable to 
Wa’. 

Combinators established computability about a decade before Turing machines. 
Their equivalent power can be seen without proof: K can delete any sequential ma- 
terial, Scan expand and compose sequences, C can swap their order, which are the 
basic mechanisms that give the Turing machines their power. In this sense the Tur- 
ing model is a formal specification of an algorithm in detail, and combinators are its 
global compositional view. 

Normal-order evaluation of combinators evaluates the leftmost outermost com- 
binator first. For example, 

BSCfga = S(Cf)ga = Cfa(ga) = f(ga)a 

As in the case of lambda calculus, the process may be nonterminating: 
WWW evaluates to itself indefinitely. 

For the sake of completeness, I list the well-known combinators in Table 7. The 
names in the table are from Smullyan’s (1985) tale of combinators as singing birds. 
They are in common use as well. 

As Curry, Feys and Smullyan note, there are many equivalences between the 
combinators. This aspect opens way to linguistic theorizing about which must be 
included in the grammar or in the lexicon, therefore they belong to the main body of 
the book. 
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Table 7. Some well-known combinators 


l Ix=x Identity bird 

Y Yx=y=xy forsome y Sage bird 
depending on x 

U  Uxy—y(xxy) Turing bird! 0? 

K Kxy=x Kestrel 

T  Txy=yx Thrush 

W Wfx-fxx Warbler 

B Bxyz = x(yz) Bluebird 

C Cxyz = xzy Cardinal 

S Sxyz = xz(yz) Starling 

d  éxyzw = x(yw)(zw) 

WW Uxyzw = x(yz) (yw) 

J  Jxyzw = xy(xwz) Jay 


The power of a combinator is a generalization of its behavior. For example, 
B" f composes f with n-argument functions, whereas B composes two one-argument 
functions. It is defined as follows: 

XU e 

X! =X, 

X" = BXX"-! for n > 1, for a combinatory object X. 

Therefore, B? fgab = BBB f gab = B(Bf) gab = Bf (ga)b = f (gab). Powers are 
not distinct combinators, and they serve a crucial role in generalizing the linguistic 
notion of arity. 

A supercombinator is a combinator in normal form in which all its argument- 
taking lambdas (its lambda bindings) can be grouped to the left, i.e. its behavior 
can be made fully transparent looking from the outside. The formal definition is as 
follows (from Hughes 1984): 

Let. =Ax,---Ax,.E where E is not a lambda abstraction. Y is a supercom- 
binator of arity n if (a) Z is a combinator, (b) any lambda abstraction in E is a 
supercombinator, and (c) n > 0. 

In other words, if we can group all bindings before E, and leave no free variables 
inside E which must be remembered—bound—outside, then we have a supercom- 
binator. Almost all the combinators we have seen so far are supercombinators, but 
not all combinators are supercombinators. The function Ay.y(Ax.yx) is not a super- 
combinator because y occurs free in the inner lambda term. Supercombinators will 
directly relate to the argument-taking behavior of the linguistic notion of ‘head of a 
construction’. 

Fixpoint combinators stand out of supercombinators because they allow us to 
capture recursion without use of names or variables. One such combinator is Y. Its 
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definition is given below. Note that Y is not a supercombinator. It finds the fixpoint 
of any function h, as shown. 


Y 9 An (Ax. h(x x)) (Ax-h(x x) 

Yh-h(Yh) 

It is truly remarkable that with the use of Y, recursion can be achieved with- 
out names. I borrow from a classic in the field of programming, Peyton Jones 
(1987: 82.4), to tell the story.! 

Consider the following definition of the factorial function, where recursion is 
explicit due to naming (which is something we cannot do in lambda calculus). 

FAC = An. IF (= n0) 1 (x n (FAC (— n 1))) 

This recursive definition can be turned into self-application without recursion as 
below, because of beta conversion. Note that H is not recursive. 

Let H = À fàn. IF (= n0) L(xn(f (—-n1)) 
Then FAC — H FAC 
because FACn —g H FAC for any natural number n > 0 

The point of course is to able to recurse without names on any function, not 
just the factorial. This is where the combinator Y can help. The factorial can be 
defined without recursion or names. The steps below are borrowed from Peyton Jones 
(1987: 27). They show that it does the equivalent of the recursive factorial. 

FAC — YH, where H is as defined above. 

FAC 1 = 

YHI- 

H(YH)1= 

A fan. IF (= n0) 1 

An. IF (= n0) 1 (x n 


( 
x 1(¥Y HO) = 
x 1 (H (Y H) 0) = 
x 1 ((AfAn. IF (= n0) 1 (x n(f (— 7 1)))) (Y H) 0) = 
x 1 ((An. IF (= n0) 1 (x n(¥YH(—7n1)))) 0) = 
x1 UF (= 00)1(x 0(YH(-—01)))) = 
xll= 
1 


The problematic property of Y is that it cannot be reduced to a form which cannot 
be reduced any further, thus the only way to stop recursion by Y is to reach nonrecur- 
sive (base) cases, such as reaching the ‘x 1 1’ step above.!!° Below is YK's infinite 
expansion. The base cases are an infinite supply of semantic objects following YK. 

YK = K(YK) = K(K(YK))  K(K(K(YK)) —--- 

In the book I follow the convention of writing a syntacticized combinator with its 
arity as a prefixed subscript. The subscript will be omitted when the arity is same as 
its combinatory definition, for example 2 for T and K, 3 for B, S etc. Curry and Feys 
(1958) use the notation (X), for the same purpose where n is the arity, but the use of 
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parentheses for that purpose is somewhat unfortunate because they do so much work 
on the right-hand side of the definitions. Other options such as X,, X", X(n)» Xr; are 
used for other purposes by Curry and Feys (1958). I note the convention for easy 
reference: 

For a combinatory object X, 

its arity k in a particular use is denoted as CX. (arity-in-use) 

Arity is omitted when it is the same as in X's definition. 

For example, 2T is same as T. 2B is binary use of ternary B. 


C: Variable elimination 


There is nothing any theory can do if a variable is free to vary. The process of variable 
elimination therefore relates to bound variables. It can be done in various way, as 
Frege, Schónfinkel, Geach and Quine have shown. This appendix is concerned only 
with the possibility of variable elimination. (The manner in which it is done bears on 
linguistic theory, and is dealt with in the main body.) 

First we note that if all bound variables of a function symbolize the applicative 
behavior of the function, i.e. if they are used in the order they are lambda-bound, and 
only once, then eta conversion can do all the work, as follows. 

Axis AXn—1AXn-fX1+++Xn—1Xn equals, by associativity, to 

Àxi i Axa iA xs (fx: Xn-1)Xn) =n 

Am: AXn—1.f%1 nl =n 


Ax1.fx1 =n 

f 

Therefore eta conversion is equivalent to saying that all semantic invariants are 
inherently typed. Once we know that f is say a three-argument function with applica- 
tive behavior, then writing f? or just f is sufficient. 

The rest of the dependencies, for example Ax.fxx or Ax.f(gx), are not eta- 
normalizable without the help of combinators. For example, the first one of these 
is eta-normalized as Ax. fxx = Ax.W fx =, Wf and the second B fg. Schónfinkel's 
work showed that two suffice for this task, because S can be seen as a mechanism 
of pushing the lambda bindings inside, which will eventually reach a base case such 
as Ax.x, Ax.y or Ax.a’, which are lambda terms with the simplest body of functions. 
These properties follow from the following equivalences. 

(Ax.MN)a =g S(Ax.M)(Ax.N)a 

hence (Ax.MN) = S(Ax.M)(Ax.N) from equivalence in lambda calculus. 

The elimination is completed by the following equivalences: 

Ax.y = Ky Ax.a' = Ka' 

Ax.x =I 

The equivalences are applicable to any lambda-definable (hence Turing com- 
putable) object. For example, Ax.MNP is equivalent to Ax.(MN)P because of left- 
associativity, thus any number of lambda terms can be handled by S. In case of mul- 
tiple abstractions such as AxÀ y.love'xy, we have to apply S-pushing to the innermost 
lambda first. 

Knowing that | = SKK, we can eliminate all bound variables and write every- 
thing in terms of S, K and the invariables. For example, everything except hit’ and 
john’ can be eliminated from the following formula. 

Àx.hit' xjohn' = 
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x.hit x) (Ax.john') = 
(Ax.hit') (Ax.x)) (Kjohn') = 
(Khit' M) (Kjohn')- 

S(S(Khit')(SKK))(Kjohn’) 

This is a dangerous practice because of K’s powers of deletion. The reader can 
verify that the following formula works endlessly to reproduce itself, due to having 
both S and K. Some steps are shown. 

SS(KI)(SS(K1I))(SS(KI)) = 

S(SS(KD))(KI(SS(KI))) (SS(KI)) = 

SS(KI)(SS(KI)) (KI(SS(KD))(SS(KI))) = 


S(A 
S(S 
S(S 


D: Theory of computing 


The theory of computing features quite often in the book because it has empirical and 
theoretical consequences for combinatory linguistics. In the first aspect, the children 
seem to be facing a computationally tractable problem in language acquisition and 
stagewise development. Granted that there have been some warnings about using the 
algorithmic complexity theory at face value for this task (e.g. Berwick and Weinberg 
1982), the narrower claim of the theory covered in the book is that a performance 
grammar is competence grammar because it delivers the immediate assembly of all 
constituents and their meanings, partial or full. Theoretically, another aspect of algo- 
rithmic computation seems very relevant to natural language: discrete representabil- 
ity, without which complexity theory is meaningless. Because Turing (1936) was the 
first to give us a view of functions unheard of before, as a step-by-step computing 
over a representation, I will refer to it as Turing representability. The appendix covers 
these aspects very briefly from a mathematical perspective. 

A Turing Machine (TM) is a finite-state abstract machine with an unlimited 
supply of sequential memory (usually called "tape") to which it can write, rewrite 
and scan one cell at a time: 

tape 
ay | a2 | a3 | a4 | a5 | a6 | a7 | Ag | Ad ajo | a11 | a12 | a13 | a14 | Als | a16 


<> tape head 
FSM 


A tape cell may contain a symbol a; or it may be blank. The FSM is the finite- 
state machine component. Before computation starts, a TM is in the start state of 
the FSM with the tape head pointing to the beginning of the input, if any, and the 
remaining cells are assumed to be blank. It stops when it reaches a “halt state” in 
the FSM (there are many alternative definitions; this one suffices for our purpose of 
computing a function). 

It has no notion of physical timing; its measure of a problem size is a combination 
of the number of states and the number of steps it takes to compute a function. We can 
assume that every basic step (read, write, rewrite or move left or right on the current 
tape head, and/or change the current state in the FSM) takes a constant time, but 
that is in theory unnecessary; it might as well happen simultaneously. What matters 
is that taking the next step requires the notion of "next", and that is either one cell, 
one symbol or one state, so that once we take the step we will have moved one step 
more than the earlier status in some regard above. These are the bases of complexity 
measures in the theory of algorithms. 

A configuration of a TM is a collection of its current state, the current pointer 
to a cell in the sequential memory, and its memory content. Trailing blank cells are 
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not considered part of the content. Memory content can of course be indefinitely 
stretched, which we can capture as a regular expression. 

A Deterministic Turing Machine (DTM) is a TM in which every configuration 
is uniquely determined by the previous one. This is Turing's capture of the notion 
of function in a step-by-step manner. If there is more than one way to take the steps 
of the function, a DTM can simulate these choices by making use of another tape to 
keep track of its moves while checking whether they all agree on the result, which 
we can keep on yet another tape. This is Turing's capture of the notion of relation, 
which is a function over powersets of inputs and outputs. We know that multi-tape 
Turing machines and other variations such as multiple tape heads, nondeterminism, 
random-access memory rather than the tape do not gives us more things to compute 
than a standard TM (see Hopcroft and Ullman 1979, Lewis and Papadimitriou 1998 
for these results). 

A TM is said to be nondeterministic (NDTM) if it can make a "guess" of the 
solution and check (as a DTM) whether it is indeed a solution. We can take the 
guessing stage to be equivalent to putting on another tape the precise sequence of 
steps to follow. In this regard we do not get a new class of computation but a new 
class of how to do computing, i.e. a complexity measure. 

An algorithm is a DTM that always decides, i.e. if it can stop for any input 
to make a decision. A nondeterministic algorithm does the same with a NDTM. A 
procedure (or heuristic) is a DTM or NDTM that semi-decides, i.e. if they can stop 
on some input to make a decision. 

Undecidable problems are "functions" for which there is no algorithm (deter- 
ministic or nondeterministic). The Halting Problem of the Turing machine in which 
a TM takes as input another TM and tries to decide whether it stops on all its inputs, 
is one such problem. The problem is at least formulable, but it is not solvable. Some 
problems are expressible but not formulable, for example: “what is the next number 
after m?” 

In the book a problem will be called Turing-representable if it can be written as 
a TM (but not necessarily solved by it). For example, the Halting Problem is Turing- 
representable as below (from Lewis and Papadimitriou 1998; halts(P.X) means P 
halts on input X). It is the diagonal(diagonal) program. The x question is not Turing- 
representable. 

diagonal(X): 

a: if halts(X,X) then goto a else halt. 
Turing-representability ties in with another line of development that gave us the un- 
derstanding of limits of computability today: the recursion theory. Primitive recur- 
sive functions are those which can be defined by identity, succession, composition 
and recursion. The successor function succ(n) = n + 1 is crucial in this definition, 
which gives us the link to Turing-representability by providing a notion of “next”. 

A computationally tractable problem is one for which there is an algorithm that 
works on a polynomial function of the size of the problem for a DTM (i.e. its number 
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of states, the number of steps it must take and the space it must use, as a function of 
the Turing-representable input). The complexity class Y symbolizes such problems. 
Computing scientists sometimes use the term “polynomial time function" to talk 
about these problems, and care must be taken not to misunderstand the word time. 
It does not measure the physical time or space but abstract time and abstract space, 
which are the abstract measures of problem "size" from Turing representations. (In 
this sense computation as we know today cannot be a natural law as Chomsky once 
suggested.) 

A computationally intractable problem is one for which there is a NDTM that 
can guess a solution and check its validity in polynomial time. This is a very im- 
portant class of complexity, called WY, for “nondeterministically polynomial". In- 
tractable problems, then, have an exponential algorithmic solution, all of which can 
be checked in polynomial time individually. 

The order of a function limits its behavior on the abstract size of the problem 
“from above.” The order of f is g, written f = O(g) by convention, if for some pos- 
itive constants c and no, f(n) < cg(n) for all n > no. If f is n? it is O(n?) and also 
O(n^) etc. It is O(n?) too, but n^ is not O(n?) or O(r?). This notation allows us to 
equate Y problems with O(n") order, for some constant k, and the MP class with 
O(k"), where n is the problem size in the Turing sense. 

Many interesting problems are NMZ, e.g. finding the possibility of the truth of a 
set of disjunctive logical formulae such as A1 V A? V A3 and =A, V A» V As. If we are 
given the truth conditions of Aj, we can check in polynomial time whether the set is 
satisfiable (i.e. true in all its clauses). If not, we must check every truth assignment, 
which is exponential on the size of the set, therefore computationally intractable. 

The fact that we do know this even if generating the entire solution space may 
wear us down relates the notion of Turing-representability, algorithms, competence 
and performance at the abstract level rather than concrete. This is the significance of 
the theory of computing for linguistics. It is an intensional body of knowledge. 

In this regard a computational look at language cannot be understood just by 
looking at problem complexity, timing or space through the classes Y and VY. The 
approach and these complexity classes are intrinsically tied to abstract and discrete 
representability, which translate to scaling up of the knowledge of competence and 
identifying similarly characterizable problems of cognition. 

We may compare a computational solution with a noncomputational one to see 
the nature of the argument. Consider sorting n quantities, say a sheaf of spaghetti 
rods to be sorted by length. A noncomputational solution in the sense of avoiding 
a Turing representation might be to conceive them as physical quantities such as 
weight, length and solidity. Sorting can be done with a variant of Dewdney's (1984) 
method, which is itself algorithmic and linear. We take the sheaf of spaghetti cut to 
different lengths, where length represents itself, i.e. an approximation of the quantity 
along which we sort. We bang the sheaf on the table and pick the ones that stick 
out progressively. This is in principle instantaneous if we leave the sorted spaghettis 
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in place rather than separate them. In contrast, a computational solution would be 
to map the quantities to some representation, say numbers, and solve the problem 
as a case of sorting anything that has a discrete representation, which is O(nlogn). 
In the first case we can claim to have understood gravity, solidity and eye measure- 
ment. In the second case we understand the nature of the problem. The first solution 
would not scale up even if we assume to have devised a representation of weights 
through spaghetti and tables because it is not translatable, it will not work in outer 
space, or for gases. We might search for a mapping of any problem so that grav- 
ity can solve it by natural laws, but in doing so we would be turning gravity into 
a computer, crucially one that works over a representation, which is the mapping 
itself. We can compare this approach to the original analog algorithm of Dewdney 
for spaghetti sort, which is indeed an algorithm therefore a computational solution 
because although it makes use of gravity to sort the spaghetti rods, it iterates on the 
broken spaghetties for sorting, hence its complexity measure is not the physical time 
associated with gravity but the number of steps. (I am grateful to Mark Steedman for 
suggesting a look at Dewdney.) 


E: Radical lexicalization and syntactic types 


Radical lexicalization refers to the process of rewriting all the rules in a phrase- 
structure grammar which do not make reference to a lexical item on the righthand 
side, as rules for the lexical items. These rules collectively become the lexical item's 
combinatory category. Two kinds of phrase-structure rules, context-free rules and 
linear-indexed rules, can always be given such a treatment. 

A linear-indexed grammar (LIG) is a context-free grammar equipped with a 
stack such that the lefthand side of rules can push, pop or pass the stack to the 
righthand side, and only to one symbol on the right (hence the term "linear"). Such 
grammars can generate strictly noncontext-free languages. For example, the gram- 
mar below generates {a”b"c" | n > 0) (*." denotes the remainder of the stack ‘| |^). 


S[..] —  aS[.b]c 
S[..] > Al 
A[.b] - Al Ip 
Al | > € 


This appendix shows the radical lexicalization of a context-free grammar. Linear- 
indexed grammars are related to CCG hence covered in the main text. 

Let us consider the following fragment of a context-free phrase-structure gram- 
mar to clarify the process. Exclusive terminals in the second column stand for the 
lexicon, and the grammar rules on the left refer to substantive categories S, NP, VP, 
V etc. 


S — NPVP Det — every 
NP — Name N — chemist 
NP > DetN Name — Kafka 
VP > Ve Viy — arrived 
VP — Mu, NP Vi — adored 


First, the information about arity is redundantly specified in this grammar. The 
rule VP — V;, NP specifies that the verb is transitive because there must be an 
NP following the verb, and the lexical entry by the preterminal V;, duplicates that 
information. We can take the rule to mean that a transitive verb, once it takes an 
NP to the right, yields a VP. That is, Viy- VP/NP in present terms. We could also 
write NP=VP\Vivy=(S\NP)\((S\NP)/NP), because from the S rule we can write 
VP=S\NP. 

Similarly, Vi, - VP. Because the NP rules have lexical anchors in this grammar 
(name and determiner), we can follow the same strategy and arrive at Det=NP/N and 
Name=NP. We could also write N=NP\ Det if we wished. The S rule has no lexical 
anchor, thus we must write it as both NP=S/VP and VP=S\NP. We have arrived at 
the following equivalences: 


230 E: Radical lexicalization and syntactic types 


Viy=VP/NP Viv=VP ` NP=VP\Viy 
NP=S/VP VP-SWNP ` Det=NP/N 
Name-NP N=NP\Det 

Hence Viy=(S\NP)/NP Viy =S\NP 


NP=(S\NP)\((S\NP)/NP) 
NP=S/(S\NP) 
We can eliminate the phrase-structure rules in the left column of the phrase- 
structure grammar above, and write only the lexical items with their new categories, 
to capture the same fragment of English surface syntax: 


every := Detz NP/N =(S/(S\NP))/N 

chemist := N=NP\Det =NP\(NP/N) 

Kafka := Name =NP=S/VP=S/(S\NP) and 
(S\NP)\((S\NP)/NP) 

arrived := VP =S\NP 

adored := VP/NP =(S\NP)/NP 


What we cannot eliminate, of course, is the right column because that would 
change the empirical coverage of the grammar. 

Any context-free phrase-structure grammar and linear-indexed grammar can be 
reduced to its lexicon if we are willing to translate the distributional categories such 
as N, V, A, P to combinatory categories as above. We can do this because any rule 
in these formalisms have one symbol on the left-hand side, with or without a stack, 
therefore a functional reading of the rule from right to left is always possible. (LIGs 
do not distribute a stack on the right, therefore the compositional reading of a LIG 
rule is straightforward too.) Notice also that the redundancy of Vt, specification has 
disappeared in the course of the translation. 

One can argue that the elimination of unwanted ambiguity leads to another am- 
biguity, viz. NP=(S\NP)\((S\NP)/NP) and NP=S/(S\NP). We shall see in the text 
that the newly introduced ambiguity is not spurious; it relates to case marking. 

A combinatory syntactic type can be thought of as a collection of the applicative 
translation of all phrase-structure rules as above, plus their combinatory derivatives. 
For example, from S/(S\NP) and (S\NP)/NP in this order we also get S/NP because 
of composition. They can be thought of as the possible landscape of all types derived 
from the lexical items as a closure of the lexicon on combinators. A linguistic theory 
will select a subset in some principled way. 

An example type is shown below. 

likes := (Sg, NNP3,)/ NP ` AxÀy.like xy 
The breakdown of its constituents is in the next page. Additionally, I use a common 
index as a simple way to share the common features among syntactic types, for ex- 
ample word := S;/(S;\NP3sc i). The i here is a shared set of features among which 
there is the third-person singular emanating from the NP. To avoid notational clut- 
ter, this convention is suppressed when it is not critical to the discussion. Feature 
abbreviations are also quite common in the book, to write NP3, to mean NPagRr-3s- 


E: Radical lexicalization and syntactic types 231 


category 
————————————————a 
string syntactic type interpretation 
lik S NP3,)/NP Axa like’ 
ikes . :— ( fin, WP3,)/NP : xÀy. ike (e (et) X 
string feature correspondence senfane tpe 
type yp 
descriptor 


predicate-argument 
structure 
When no confusion arises, I will use the term category for the combinatory syn- 


tactic type. 

A consequence of radical lexicalization is that one end of the rules for the lexical 
items is the syntactic type, and, since there is no other loci if lexicalization is strictly 
followed, then the other end has to be a predicate-argument structure, which bears 
the semantic types. I cover the consequences of this result in the main text. 

A semantic type is a narrowing of a predicate-argument object in possible val- 
ues. The type e is for things (Montague's entity), t is for propositions, and (e,t) is 
for predicates and properties, that is, for functions from things to propositions. For 
example the transitive verb like, with the semantics AxAy.like'xy, has the semantic 
type (e, (e,t)). The eta-normalized version like’ is assumed to carry this type along. 
Thus, like’ is not of type t, which its bare form might suggest. In that sense, every 
semantic object has a type. 


F: Dependency structures 


Dependency structures may be specified over words in a string in some theories 
and over predicate-argument structures in others. This topic belongs to an appendix 
because it can be done without combinators. With combinators, it is defined over a 
predicate-argument structure, and this narrow view is explained here. 

A dependency structure is a relation between two semantic objects. For our 
purposes it can be defined as follows. 

A function depends on its arguments. (dependency) 

Juxtaposition xy means ‘x depends on y' (juxtaposition) 

It arises from a functional interpretation of the concept. (I use a distinction, more 
commonly made in computer science and computational linguistics, between func- 
tions, predicates and algorithms. The term function refers to opaque properties, such 
as arity, dependence and output, whereas the term predicate refers to transparent 
properties such as the event class, argument structure and their obliqueness, although, 
formally speaking, they are both functions or relations. Algorithms are functions that 
do something. I will use this term when we are interested in the task of the function 
rather than its dependencies or structures.) 

A predicate-argument dependency structure, abbreviated to PADS in the 
book, is a predicate-argument structure of dependencies where the leftmost element 
is a predicate. For example, join mary is a dependency structure but not a PADS, 
and sleep! john! is a dependency structure and a PADS. As we shall see throughout the 
book, PADS is different from the logician's logical form, from the transformational 
linguist's logical form, from the dependency structures between words of a string, 
and from model-theoretic objects. It is lexically determined and projected. It depends 
on the syntactic type in crucial ways, codetermines it in crucial ways, and it is in- 
deed a nonassociative structure: (hurt'love')mary’ is different than hurt’ (love' mary). 
In the first case, mary’ cannot be construed to have an individual relation to the other 
elements. Likewise for hurt’ in the second case. The first one might arise from an 
expression such as Mary thinks that love hurts, and the second one from Mary's love 
hurt John. 

For example, in sleep'kafka, sleep’ depends on kafka.’ In like'milena'kafka,’ 
there are two dependency relations: like’ depends on milena,’ and like'milena' de- 
pends on kafka.’ These embeddings follow from the left-associativity of juxtaposi- 


tion, which we can show as: 
Pu 


like’ | milena' 
The relation can be abstracted over. For example Ax.sleep'x abstracts over the 
argument of a dependency, and À f. f kafka' over the function. 
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We can take the tree above to signify the obliqueness of the arguments of the 
predicate in the prefix. We can say that a leaf node that c-commands another in the 
PADS is less oblique.!!! That is one of the reasons why we consider PADS to be 
a structure rather than a flat list. There would be no obliqueness relation for the 
leftmost element of a PADS. For example slept'john' does not manifest obliqueness. 
We can also say that a predicate "sees" its arguments one at a time in PADS: the 
elements that c-command the leftmost element in its PADS are its arguments. In the 
example below, the arguments of p are a, (bc) and d, not a,b,c and d. 


d 
p a b c 

We shall see in the book a combinatory equivalent of arity and argument struc- 
ture specification, without the need of another primitive such as c-command. Order 
and its semantics will be doing the work, rather than auxiliary assumptions. Notice 
that we have already obtained the result from juxtaposition that obliqueness rela- 
tions are asymmetries; no two arguments can c-command each other in the notation 
pred'arg' arg, : : argh. 


Notes 


. Enjoy the silence, Depeche Mode. Lyrics by Martin Gore. 
. Songbooks are not the right sources for Fraser words. They would be reinter- 


pretations. Try a live performance or soundtrack of Cocteau Twins, with Liz 
Fraser singing her own words in the truest sense. 


. Dissemination by personal contact seems to be the fate of Schónfinkel's work. 


His other paper, the only other work he published, Schónfinkel (1929), was also 
prepared for publication by a colleague, Paul Bernays, who was in Góttingen at 
the time of the 1920 seminar. Curry's personal notes reveal that Bernays helped 
Behmann with the preparation of the 1924 article as well. 


. Besides the theory of Combinatory Categorial Grammar there is also the sub- 


field of Planning in Artificial Intelligence which makes heavy use of the se- 
mantics of adjacency, not to mention the most rapidly growing community in 
computing, the functional programming community including Lisp, Haskell, 
Javascript, Python, Ruby among many others. All of these make use of com- 
binators. There is also a real computer architecture called SKIM (Clarke et al. 
1980) in which the only primitive instructions are Schónfinkel's combinators. 


. For independent discovery and rediscovery of the principles involved, see Frege 


(1891), Quine (1966), de Bruijn (1972). 


. For brevity, I use pefo' as an abbreviation for the semantics of persuade every 


friend of. Likewise tvf' for to vote for. For the semantics of (7d) I follow Hoyt 
and Baldridge (2008): the ‘?’ operator is variable-binding for the question Q. 
We shall see in the book that the research program of CCG is not to elimi- 
nate the variables such as x in this example, although it is certainly doable by 
combinators. 


. There is another way to make complex symbols out of simple values. Feature- 


based theories of syntax follow this path. For example, we can employ sub- 
typing as in HPSG (Pollard and Sag 1994) to define e.g. clausal-subject and 
nominal-subject as subtypes of subject above. These types can be made part of 
a theory of features for a combinatory system too, as Beavers (2004) and Mc- 
Conville (2006) do. They are, however, different from the combinatory syntactic 
type in not having a strictly sequent semantics. 


. From a psycholinguistic perspective, the only recourse would be slips and false 


starts, in which some new types would be involved rather than a reinspection of 
the used types. Thanks to Belma Haznedar for clarification. 


. The entries in (9c-f) must be related because they arise from the same lexeme. 


This has to do with syntax-phonology-morphology interface and theory of the 
lexicon. These aspects are not covered in the book in detail. 


. It was Góksel (2006) who first observed the anaphoric behavior of the plural 
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and possessive suffixes. 

Coordination of unlike categories, such as John is a republican and proud of it, 
does not have true coordination semantics: *John is proud of it and a republi- 
can. We should also be wary of an accidental capture of coordination by like- 
categories: John bought a beer and drank it, versus John drank it and bought a 
beer, which has a different meaning. This is an early warning that syntax alone 
cannot account for all semantic constraints. Jacobson's (1999) warning for the 
insufficiency of the like-category constraint for CSC and its across the board 
consequences arises from another semantic problem, that of reference, and it 
raises similar concerns for variable-friendly syntax and semantics; see Chap- 
ter 6. 

I write some of the left-hand sides in single quotes because, strictly speaking, 
they are not syntactic types in the combinatory sense; they can be thought of as 
the morphological sources of the syntactic types. 

I will follow in the book a common view that morphological types relate to 
form, syntactic types to constituency and semantic types to interpretation. The 
distinction is crucial for many theories such as the Separation Hypothesis in 
morphology (Beard 1995), in which morphological types do not “see” semantic 
types. Distributed Morphology (Halle and Marantz 1993) suggests that it is the 
phonological material that cannot see the other kinds of information because it 
is inserted after the syntactic process. As all theories agree that syntactic and 
semantic types must see each other to do compositional semantics, this issue is 
not too critical to combinatory syntactic types, which is our main focus. 

We are presently assuming that Jove’ is not innately typed; otherwise we would 
know without exposure that it is (e, (e,t)) semantically. This knowledge is ac- 
quired. An implication of the present discussion to word acquisition is that the 
complete interpretability of words is an intrinsic part of their knowledge, rather 
than innateness of e.g. the transitive construction or some universal argument 
structure. This knowledge, which we might consider to be the syntactic reflex 
of the child's cognitive burden of attempting to make sense of the world, nar- 
rows down the search problem in language acquisition. The relation of the task 
to syntactic types is implicit in e.g. Siskind (1995). We shall see more detailed 
examples in $9.5. 

This view has been advocated much earlier, and it was severely underappreci- 
ated by the transformational grammarians. Halliday (1966, 1970), Halliday and 
Hasan (1976) had not claimed that there is no structure in language, in written 
or oral text, only that we should look for it where it really mattered, and where 
it can be observed to be at work. 

A variant of transformationalism such as that of Kayne (1994) in which order 
is seen as a reflex of structure might seem similar to CCG in comparison to 
other theories mentioned. The two programs are not compatible because CCG's 
conjecture amounts to saying that structure is a reflex of order, an opposite 
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conclusion with respect to Kayne. Likewise, Hawkins (2001)-style adjacency 
effects to explain the category adjacency rely on structural domains minimized 
on structural aspects, which presupposes structure-dependence rather than com- 
binatory type-dependence. Notice also the crucial use of movement in Kayne’s 
hypothesis to claim a universal subject-verb-object word order, which compro- 
mises the use of adjacency for syntax and semantics. 

It was explicit in Curry's notation before he met Schónfinkel (1920/1924) in 
a literature search. Curry (1927) in his personal notes translates for example 
Schónfinkel's la to his then-current notation |a. Curry (1929) notes that, in 
(xy) nothing is said if x is not a function, and suggests taking such (xy) to be 
equal to Kxy, which we will follow. 

If f is binary, a purported definition such as f(x) can be understood to be lossy 
only by investigating the “body” of f, which would make use of another argu- 
ment, say x». Similarly, f(x1,x2,x3,x4) could be found to be too liberal if the 
body of f makes no use of say x3 and x4. 

Both cases treat arity as an illative notion rather than a stipulative property of 
f. This may be a better way to proceed in cognitive science rather than the ax- 
iomatic approaches to argument-taking commonly assumed in computing and 
linguistics, provided that we can manage to keep infinite regress under control 
and stay empirically sound at the same time. For example, no language has 
manifested a ditransitive s/eep predicate where only two arguments are syntac- 
tically available, and no language has a syntactically argumentless verb. These 
facts want explaining rather than stipulation. 

Penrose's more famous conjecture, that the human mind is noncomputable, is 
not relevant here because a theory to predict possible languages would not be 
a theory to predict possible minds, unless of course we believe language is all 
there is to mind. 

Chomsky (1965: 62): “[..] It is important to realize that the questions presently 
being studied are primarily determined by feasibility of mathematical study, and 
it is important not to confuse this with the question of empirical significance." 
For our purposes, it suffices to note that the primitive recursive languages are 
languages of functions as programs which can be written without indefinite 
looping such as “repeat” or “while”, and where the notion of “next instance" 
plays a crucial role. Not all recursive languages are primitive recursive, for ex- 
ample the Ackermann function. 

Chomsky (1965: 208:fn.37): "This possibility [that the least powerful empiri- 
cally adequate theory might turn out to be equivalent in weak or strong gener- 
ative capacity to Turing machines] cannot be ruled out a priori, but in fact, it 
seems definitely not to be the case. In particular, it seems that, when the theory 
of transformational grammar is properly formulated, any such grammar must 
meet formal conditions that restrict it to the enumeration of recursive sets." 
Levelt (1974) is more explicit. He equates descriptive adequacy of a theory 
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with providing linguistic grammars that stay within recursive grammars, and 
explanatory adequacy with providing primitive recursive grammars, i.e. there 
must be a way to see how the grammar is caused. Thus for Levelt, any grammar 
for a natural language must be decidable. 

Infinite regress is not a concern here because it can be avoided so long as we do 
not ask for the entire solution space at once. Consider a Putnam-Gold machine 
Mı which takes another Putnam-Gold machine M» as input. M, can leave an 
initial answer on the result tape, and reconsider its output if similarly operating 
M» changes its output. Both M; and M» will have fetchable answers at any time, 
although they may both be undecidable. What we cannot have is M; to ask 
whether M» has stopped and delivered all its results. The process is reminiscent 
of lazy evaluation in programming languages, although they arise from different 
concerns. 

In a nutshell, the most demanding task in the execution of a program is access 
to names. As most programming languages allow nested definition of names, 
the task is exacerbated by the look-up of names which are not local to the 
currently executing subprogram but defined elsewhere. The theory of compil- 
ing has found ingenious methods to tackle the problem. The problem becomes 
a nonproblem when there are no variables. With this in mind, programming 
language design and compiling become the art of translating a programmer's 
specification, which includes variables for the benefit of the programmer, to a 
variableless executable code. 

The statement is attributed to Merrill Garrett by Fodor (1983). Chomsky 
(2000: 124) considers it problematic: “The belief that parsing is “easy and 
quick;" in one familiar formula—and that the theory of language design must 
accommodate this fact—is erroneous; it is not a fact." He considers it to be a 
performance issue, and needlessly complicating a competence grammar since 
parsing according to him is not its business. It is not clear to me what Chomsky 
means by “design” in a product of evolution, but other conceptions of com- 
petence, such as Bresnan and Kaplan's (1982b) Strong Competence Hypothesis 
where the performance grammar just follows the instructions of the competence 
grammar, or Steedman's (2000b) Strict Competence Hypothesis where compe- 
tence grammar is the performance grammar, take more burden of proof on their 
shoulders than Chomsky, by taking Garrett's remark as an empirical observa- 
tion about grammar. This is essentially the view adopted in Levelt (1974: 236) 
as well: “The data for competence research are linguistic judgments, which are 
forms of language behavior. It is not clear why just this type of language behav- 
ior (linguistic judgment) should have the privilege of leading to a theory.” 

It should come as no surprise that one of the earliest objections to semantic 
vacuousness of some words is from one the most prominent semanticists and 
phonologists of the 20th century, Dwight Bolinger (1977). 

Ades and Steedman (1982) and Szabolcsi (1983) appear to be the first syntacti- 
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cizations of this kind. Geach (1972) is a syntacticization of composition as well, 
from the perspective of set theory, following Quine. Up until Steedman (1985, 
1988), Szabolcsi (1987a), CCG developed independently of Schónfinkel's and 
Curry’s combinators. 

Note that (A/B)/C: f is a two-argument function whereas A/(B/C): f is a one- 
argument function. 

Smullyan pays homage to Schönfinkel and Curry in the choice of species 
as well. The book is dedicated to Curry, an avid bird-watcher. Smullyan 
(1985: 241) has his inspector Craig’s trusted friend Fergusson cook up a story 
that Schónfinkel means “beautiful bird" in German. The Yiddish suffix "el" 
adds a morphological mystery to ornithological logic. 

The first appearance of the paradoxical combinator in publication is Rosen- 
bloom (1950), who called it O. Curry had worked on this combinator since 
1929. 

Notice that the mismatch arises from the assumed sameness of semantics for 
X/Y and X\Y, viz. b. As explained in the introduction, we can have A > C/B 
and B — C\A, if we know that A and B in this order derives C, i.e. A B — C. 
This equivalence spells the correspondence X/Y: b — X\Y: Aa.ba, which can 
arise from the configuration Y: a X/Y: b — X: ba. 


. À system is called applicative if it uses application as the only primitive. 
. The two combinators are obviously related. Curry and Feys (1958) give the 


following equivalence: V=®(®(®B))B(KK). The K’s symbolize gapping. 


. Smullyan’s (1985) Eagle (E) takes five arguments, like his Dickcissel and 


Dovekie, and they are the least visited birds in his book (also the seven- 
argument giant, the Bald Eagle). 

Y = SSK(S(K(SS(S(SSK))))K). It is cumbersome, but it does the job. 

We might go one step further and derive S and K from Barendregt's (1984) 
combinator X, but not without some circularity. Take X = Ax.xKSK. Then 
XXX = K, and X(XX) = S. The bottom line is, if we want the complete elimi- 
nation of variables, we need the S and K somehow; witness KSK in X. 

Further optimizations are possible, for example using BCS or CD® to elim- 
inate the unnecessary proliferation of S abstractions; see Curry and Feys 
(1958: 188ff), Turner (1979). 

I suggest the name O to symbolize its internalized lambda, and to acknowledge 
that it turned out to be different than Din discussions with Umut Ozge, Ja- 
son Baldridge and Frederick Hoyt. This combinator was named D by Hoyt and 
Baldridge (2008) with the same semantics and syntax covered here. I proposed 
to change the name to avoid confusion with Rosenbloom’s (1950) D, which has 
different semantics and syntax. 

Take f to be Ay.yb, for some b. Then, for some a, we have Ax.f(g(hx))a =g 
g(ha)b, but f (Ax.g(Ax))a =g g(hb)a. 

The theory began with Ades and Steedman (1982), written in 1979. Steedman 
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developed the theory in a series of papers (Steedman 1985, 1987, 1988, 1990a,b, 
1991a,b, 2000a). Synopses can be found in Steedman (1996b, 2000b), Steed- 
man and Baldridge (2011). 

On a historical note, the interdefinability of combinators was dealt with in a 
special section of Curry and Feys (1958), written by William Craig. Smullyan’s 
(1985) engagement of a chief inspector of the same name to tackle ornitholog- 
ical affairs acknowledges this somewhat neglected contribution. In linguistics, 
interdefinability is prominent in Anna Szabolcsi’s (1983, 1987b, 1989, 1992) 
work. She was principally involved in bringing S syntax to explanations, which 
was identified by Steedman to arise from S semantics. 

The idea was influential in the structure-dependent theories as well, starting 
with early transformations. It is most formally dealt with in Pollard (1984). 
Bach (1984: 7) defines the semantics of persuade in Montagovian terms: "per- 
suade is interpreted as denoting a function from properties to a function form 
terms to sets". The property translates to a VP, and the function from terms to 
sets is a transitive verb, i.e. (S\NP)/NP, hence the need for surface wrap. 
Sometimes the distinction is attributed to proof-theoretic versus model-theoretic 
approaches to syntax, but this is slightly misleading. It is true that CCG is a 
combinatory theory of adjacency syntax, rather than a set-theory of linguis- 
tic constraints. The words are the models though (assuming no words with 
Y semantics), because every constraint on a word's syntactic-semantic behavior 
must be reflected in its lexical category, hence any Montague-style valuation in 
a model frame can be reduced to truth conditions for sentences. Type-Logical 
Grammar leaves some proof-theoretic results, such as the provability of cross- 
ing compositions in CCG, to models. 

That is to say they are not Aristotelian categories. Husserl's categories are open- 
ended, and they do not rely on a set of basic categories determined a priori. 
Steedman (2000b: 54) defines these principles as follows. Consistency: “All 
syntactic combinatory rules must be consistent with the directionality of the 
principal function". Inheritance: “If the category that results from the appli- 
cation of a combinatory rule is a function category, then the slash defining 
directionality for a given argument in that category will be the same as the 
one(s) defining directionality for the corresponding argument(s) in the input 
function(s).” 

The claim here is that (20b) is ungrammatical with the intended coordination 
reading but fine as a parenthetical. 

See Baldridge (2002), Baldridge and Kruijff (2003), Beavers (2004), Mc- 
Conville (2006) for comprehensive attempts at a feature geometry for CCG. 
The stronger sense of radical lexicalization and its effects on constructions and 
constituency can be observed when we compare related grammar theories. Con- 
sider some cherished Construction Grammar examples below, quoted by Gold- 
berg (1995) as part of the crucial data in her book’s opening. 
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i. I loaded the hay onto the truck. Anderson (1971) 


ii. I loaded the truck with the hay. 


Example (i) is claimed to semantically differ from (ii) over and above the mean- 
ings of the lexical items involved, where (ii) implies full loading in some sense, 
and (i) does not. No such difference seems to follow from the same construction 
with different lexical items (iii-1v): 


iii. Z loaded the CD onto the multi-cd player. 


49. 
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iv. I loaded the multi-cd player with the CD. 


Moreover, we need to account for the following effect, where fullness or par- 
tialness of the readings seem to be restored across the board because of con- 
stituency: 
v. I loaded the truck with hay and the multi-cd player with CD. 
There is a prediction of CCG about this construction which awaits research. If 
the maximum arity in any lexicon is n, then the power of B must be bounded 
by n-1 to stay within the class of efficiently parsable linear-indexed grammars, 
therefore n+1-sequent verbs of subordination is all it can handle. Steedman 
(2000b) suggests that n=4 for English. This issue brings back Shieber’s (1985) 
warning that considering the possibility of bounded crossing reduces all linguis- 
tic arguments to finite structures. The book has already steered toward that di- 
rection by saying that something can be finite but vast, and we would still need a 
linguistic theory to sieve through possible structures. Following this route would 
not fall into the fallacy of turning to regular expressions as linguistic theories. 
The Kolmogorov-Chaitin complexity of describing all and only the possible 
structures with them would be prohibitive, and it would not amount to a theory. 
We would expect a theory to be much shorter than what it descriptively covers. 
Whether finite or infinite in their stringsets, the languages seem to manifest 
limited constituency and dependency. A language can be infinite in terms of its 
stringset but finite in terms of possible structures, as for example free opera- 
tion in syntax (i.e. closure) might suggest. Given these aspects I consider the 
infinitude argument secondary in linguistic explanation. 

Szabolcsi (1983) called it connection—tecall Schónfinkel's name, fusion, for 
the same effect. Steedman (1988) related connection to S. 

Szabolcsi (1989) might appear to introduce unary B to English syntax, but she 
does that only for syntactic objects, hence it is a lexicalization of unary B. 
Having two categories for dymuno ‘want’ in (36—37) is empirically sound; the 
same differences can be observed in control verbs of other languages, for exam- 
ple English and Turkish: The hair wants cutting, and Wittgenstein wants to like 
Russell. They might arise from a single category of want! but that is a matter 
of argument structure and the lexicon. 
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The ongoing discussion of observing the combinators' semantics in syntax 
must be distinguished from similarly inspired operator-based systems, i.e. sys- 
tems which relate two expressions by the use of combinators, such as that of 
Shaumyan (1987). For example he notes that That man, I hate him, with the se- 
mantics hate'x thatman'i', where x is presumably the pleonastic use engendered 
by him, is related to I hate that man, by K. Its semantics is hate'thatman'i. I 
have nothing to say about such systems except to note that they need some no- 
tion of synonymy, and run into the same difficulties that face the any-debate on 
undecidability; see $3.3 for Hintikka's (1977) synonymy argument. 

A point of clarification: a lexical rule in CCG means a unary rule that only refers 
to substantive—therefore lexical—categories. It does not mean a rule that gives 
us more lexical items. 

This chapter arose from discussions with Umut Ozge. Usual disclaimers apply. 
Cf. fn. 24, where simplifying the use of bound variables for the benefit of the 
programmer is claimed to ease the task of software planning. 

That discourse is perhaps necessarily involved in such examples is evidenced by 
the proposals that can provide their bound interpretation in syntax, such as that 
of Pinkal (1991: (12)) “A NP à can bind a pronoun D provided that f is in the 
c-command domain of the host quantifier of os discourse referent.” Without 
an analysis of the English genitive, it is not clear how such examples might be 
accounted for by Jacobson's variable-free semantics. 

The idea of type-raising all arguments in a grammar seems to go back to Mon- 
tague (1973), Lambek (1958, 1961). Montague's set-theoretic type e is empty. 
His subjects must be ((e,1),1). Lambek’s radical lexicalization translates all NP 
types in a phrase structure grammar to their grammatical roles, i.e. to their type- 
raised variety. 

Ina VSO language we cannot maintain in surface structure that the least oblique 
argument is the last one to combine. Keeping this as a universal was one mo- 
tivation for Dowty (1996) to abandon the adjacency assumption of CCG and 
adopt a surface-wrap analysis. 

The rule (28) has the same result semantics as z-NP, which can be veri- 
fied from its configuration: X/Z/Y: f Z: a Y/Z: g + X:f(ga)a, where Z=NP, 
and fhas an inner semantic—lexical— wrap. Crucially, the rule avoids the 
unary S semantics of AgÀx.fx(gx), against which Jacobson (1999: 136) warns 
us to eliminate His.; mother loves every Englishman;. However, the rule (28) 
would produce $/ NpNP WP3, for loves, therefore it would derive Mary loves 
him wrongly, unless verb-medial languages by-pass the rule by some ‘same di- 
rectionality’ constraint on the |’s, with predictable consequences for OSV lan- 
guages such as Hixkaryana. Clearly there are restrictions on the syntactic type 
of f, ‘|r and ‘| j related to the crossover phenomena, which must remain cur- 
rently as open questions. 

For example, tag questions require a pronoun: John will come, won't he? Welsh 
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periphrastic passive requires a pronoun as an independent word. Steedman’s 
model eschews the use of a distinct syntactic type for pronouns. Therefore in 
such constructions, the pronoun is predicted to be the head which can look for 
the arguments. 

The degrees of freedom afforded by CCG in this domain is worth reiterating. As 
Steedman’s (2011) LF eschews an exponent type in syntax, it cannot require it in 
a syntactic domain of locality. However, an analysis which takes the possessive 
pronoun as the head rather than cael is logically possible, and it will avoid an 
exponent type in the domains of locality. Such variations await research. 

This lack of interaction between the parser stack and the quantifier store is 
most evident in the recent formulations of Cooper storage, such as in Pollard’s 
(2008b) reworking of extended Montague Grammar to Convergent Grammar. 
His construal takes as its fundamental assumption the lack of an interaction. 
To be more precise, there is no subject reflexive that can have an antecedent in 
the same clause. There are languages in which a subject “reflexive” can take an 
antecedent from a higher clause. 

We can take the last sentence of the quote to suggest that the number of distinct 
PADS objects in a mental grammar is probably less than the number of syntactic 
objects, whereas the number of PADS tokens is probably higher, so that they are 
forced to recycle among the lexical entries to provide a network of relations. 
CCG is not designed to cope with such networks. 

Notice that, if the string contains a syntactic displacement, say the cat which I 
think sleeps is amenace, where the substring ‘sleeps is’ clearly does not embody 
an argumenthood relation between the two objects on its either side, the syntax 
of the other combinators involved will take care of the semantic dependencies to 
get the sleep’ and cat’ argumenthood right. The point of combinatory argument- 
encoding in a string of objects is that what cannot be torn apart and displaced 
separately is the B'Is/eep’ part, which comes from the lexicon. 

a, b, g, h, i are from Baldridge (2002), c is from Steedman (2000b), and j is 
Steedman (p.c.). The use of > Ox, < Ox, >S” and <S” awaits further inquiry. 
I take en to symbolize a syntactic feature such as Fen, where Zen is assumed 
for cael. 

If we are told that semantically speaking the cael involved in the passive is not 
the same as active ‘get’, we can readjust our analysis to Jacobson-style pronouns 
and demand the passive cael to look for an NpNP argument rather than NP. 
The radical lexicalization of the passive using the possessive pronoun is fur- 
ther supported by the fact that 3sg form (ei) soft-mutates the uninflected verb, 
whereas 3pl (eu) does not (Awbery 1976:p.49). The analysis also coincides with 
Awbery's intuition that the phrase after Wyn in the example is a term of cael: 
notice the final dependency structure. 

The across-the-board claim for the passive in languages of the world is that it is 
an operation which targets lexical verbs, and that that might be the reason why 
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it always targets the least oblique argument for demotion because every lexical 
verb has one. However, this line of reasoning does not explain why the passive 
promotes certain objects and not others. Our focus here is to work towards an 
explanation for its (clause) boundedness. 

Lexical access to thematic structure does not in itself fully characterize the pas- 
sive in relation to the reciprocal, causative and the reflexive. The property is 
proposed here as a necessary and insufficient condition to pin down the seman- 
tics of the passive. 

There are exceptions in the verb-medial languages as well. For example, bare 
complements make it easier to break the word order constraint: The cat which 
I knew (*that) would be a menace is Carlyle; see Steedman (1996b) and 
Baldridge (2002) for extensive discussion. 

Examples such as (18a) are sometimes considered ungrammatical by some 
Turkish syntacticians on the grounds that they are odd without a context. Since 
there is no such thing as null context, and because a competence grammar must 
provide a derivation no matter how unlikely a meaning is if it is grammatical, 
we must keep such examples on the agenda. To see that (18a) is grammatical, 
consider a case where the topic is Ahmet's strange shooting practices. 

For example: use S’ rather than NP if the argument is clausal, use an NP rather 
than S’ if the semantics of the construction is participatory therefore lexically 
visible, as in the passive. 

Some of the material in this section arose from discussions with Mark Steed- 
man. I present here my recollection and conclusions. Possible misunderstand- 
ings are mine. 

This is true of any kind of computation, not just CCG. For example, a common 
practice in programming language compiling is to replace tail recursion with 
simple iteration. This optimization cannot be done for nontail recursion, which 
would be the true reflection of Y in the syntax of a language. 

Finitude is certainly not a mental block to creativity. Pullum and Scholz (2009) 
suggest that Japanese haiku compositions can continue forever because the pos- 
sibilities are finite but vast: up to 10?^ haikus, but certainly a lot less number 
due to other constraints, but still a vast number. 

Itis quite striking that the two philosophers who sharply differed from their pre- 
cursors and contemporaries in ascribing to animals skills that are only different 
from humans in degree rather than in kind, Hume and Wittgenstein, essentially 
saw a continuous problem space for coordinated action and experience of living 
things. These fresh perspectives rightfully established them as the philosophers 
dearest to some cognitive scientists. 

The parts of the lexicon that are not visible to syntactic processes are formal 
knowledge of words such as the word-formation rules of Anderson (1992), 
Aronoff (1994). They do not necessarily depend on syntactic types. A morpho- 
logical theory must explain these processes by giving us a landscape of possible 
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morphological types. 

The question of coordination being asymmetrically sensitive to the left or right 
conjunct is dealt with in Steedman (2000b) from a CCG perspective. 

Examples (21—22) are from Ozge and Bozsahin (2010). 

I assume that morphology-phonology at the interfaces handle -en versus ge-V- 
en alternation in Dutch passive morphology (including the choice of -d or -t 
in place of -en), and yield a morphemic segment which I symbolized as -EN 
above. In this process there would be no involvement of its syntactic category, 
assuming the Separation Hypothesis of Beard (1995). The syntactic types do the 
ordering of combination in the syntactic process. For example, the particle op- 
‘up’ in opgestegen might be the source of telicity as van Hout (2000) suggests, 
and this would be carried over to syntax by the syntactic type of the lexical item 
which we can symbolize as OP-. 

For purists, we can assume that everything in the lexical conceptual structure is 
projected onto PADS but only a few members of the powerset is used by syntax, 
which shows the need for a theory of the lexicon although the powerset is in all 
likelihood finite. 

As Rey (1986) reminds us, another computationalist trend, strong AI, is simi- 
larly accused wrongly about its aspirations of computationalism, which is func- 
tionalism, not behaviorism. 

All CCG learners work within the parse-to-learn paradigm. The alternative, 
which is the learn-to-parse paradigm, seems inconsistent with Garrett’s obser- 
vation reported earlier that parsing is a reflex; see Fodor (1998), Steedman and 
Hockenmaier (2007) for discussion. No-parse paradigm relies on lexical lookup 
of words, and it presumes a more or less disambiguated lexicon for the child, 
which does not seem very realistic. 

In earlier work (Cóltekin and Bozsahin 2007) we called D the likelihood. We 
thank Orkan Bayer for pointing out our error. 

This notion of “having a meaning" is not related to Quine's use of the same 
term for grammars, as Chomsky never tires of pointing out; see e.g. Chomsky 
(2000: fn.18:199). 

To be more precise, Siskind's cross-situational learning emphasizes the likely 
meanings of words rather than possible meanings, the latter of which Quine 
argued to be infinitely many. A similar narrowing of word meanings is defended 
from a linguistic perspective by Williams (1994). 

Much of Quine's possible readings are eliminated by the parsimony principles 
of Siskind (1996). The list provided is only a first approximation for this pro- 
cess. For example, from Siskind's principle of exclusivity, the child can con- 
clude that chocolate does not mean whatever she assumed for plu’ because in 
the first experience there is the plural assumption but no chocolate. 

I am grateful to Aravind Joshi for related discussion. 

Thanks to Alan Libert for these examples. I am responsible for the lexicalization 
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claim. 

There is something other than case marking and word order that comes to the 
rescue in the recovery of grammatical relations: agreement systems and noun 
classes; see Steele (1978), Mallinson and Blake (1981). Notice that, in a system 
of combinatory syntactic types, these morphological resources narrow down 
the syntactic types just like case marking, without different levels of structure 
or subsystems. 

There is a certain myth about scrambling languages. If asked in isolation, a 
speaker might say that a legitimately permuted sentence means more or less 
the same thing as the unpermuted ones. But this is hardly the right question. 
Provide theme or rheme alternatives before the example, and most speakers 
would prefer one word order only, if the alternatives are set to elicit that order. 
More interestingly, they would reject most of the others as either ungrammatical 
or contextually inappropriate, suggesting that there are other semantic reasons 
than who-does-what-to-whom. 

Here I assume a tripartite functional division of labor in parsing, following 
Steedman (2000b): (a) a grammar, (b) a parsing algorithm to derive strings us- 
ing the grammar, and (c) an oracle to choose between the alternative derivations 
and potential ambiguities. 

These examples might be considered odd in a null context, but certainly not 
ungrammatical. They are perfectly interpretable in for example a partitive con- 
text in which there was a children's party where several delicacies were served, 
chocolate among them. I deliberately avoided the aorist sever ‘loves’ to rule 
out generic readings yet still maintain the indefinite ones. See Ozge (2010) for 
more examples of indefinite accusatives, and for an argument that, in Turkish 
studies so far, pinning down the semantics of definiteness and specificity to the 
morphemes has not been very successful. 

A definite reading can be obtained in response to the question: What did the 
kids think of the sweets we served? The indefinite reading may follow from the 
question: Can we say that we made all the guests happy? The issue is unset- 
tled; see Nakipoglu (2009), Ózge (2010) and references therein for extensive 
discussion. 

For an informative and entertaining exposure to monads and for their relation to 
interactions in computation, see Wadler (1997), who relates them to Descartes's 
mind-body problem. 

From the song Flaming in Pink Floyd's 1967 album, The Piper at the Gates of 
Dawn. Lyrics and music by Syd Barrett. 

This is similar to French echainement, for example faux ami [fo][za] [mi], but in 
the backward direction. 

The first two correspondences of (11) and the first one in (12) highlight an 
equivalence on the semantic side modulo eta-conversion of lambda calculus. 

It is explicit in any model of CCG that the bootstrapper for acquisition cannot 
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be just phonological or semantic; it must be grammatical because a lexicalized 
syntactic type is the only way to establish the correspondence of a string with 
some predicate-argument structure. See Steedman and Hockenmaier (2007) for 
discussion. 

There is a combinatory equivalent of 1]'s ordered pair constructor (c,x), which 
is based on D» and Z, in Curry and Feys (1958). I eschew it here for shorter 
exposition. 

u is commonly formulated as BSC fg = Ax. f(gx)x in the reader monads, as in 
Shan (2001), which in our case is BSCa(dU) — Ax.a(d Ux)x. The order of the 
dependencies (d Ux) and x is not critical in the monad; order is already encoded 
in the input by 7. BSC does not encode a dependency which is functionally 
different than that of S, hence my choice of the better-known combinatory term 
for u in (14). 

The monadic version coincides with Jacobson's (1999) use of composition as a 
sequence of unary B followed by application, which is generalized in monadic 
grammar to apply to all combinators. 

Another excursion of Curry to linguistics is Curry (1929), where he defends the 
grammarian's view of meaning over the logician's view of meaning. 

This is unlike Locke's naive empiricism. We cannot assume tabula rasa for de- 
pendencies. And we must assume a syntactic specialization of combinators. 
Hume has always insisted that human beings bring something special to their 
understanding, and that they cannot help themselves attributing for example a 
causal link when there is no causation. In other words, some things are internal- 
ized to the point of a reflex. 

We owe our current understanding of the Baldwin effect to Simpson (1953). 
Baldwin (1896) thought he had found a new cause for selection, which he called 
organic selection, in addition to natural selection. Simpson identified it to be 
an effect rather than a cause, and coined the name. The Simpsonian view of 
Baldwin is what makes Deacon's proposal tick. There seems to be coextensive 
but not separate mechanisms for selection. 

It seems to be a major point for Bickerton and Chomsky (2000) that human 
evolution has more or less stopped—remember Chomsky's claim that language 
is a perfect system, and that only languages as phenotypes may contain imper- 
fections. Anthropologists and biologists, not to mention evolutionary linguists 
and neuroscientists, consider that to be very unlikely; see Hawks versus Jones 
debate at Hawks (2008). 

Strictly speaking, the Turing bird which Smullyan defined as U would not 
be functionally equivalent to the Sage Bird Y named after Curry. We need 
the equivalent of Turing's (1937) definition: UZ(AxAy.y(xxy)) (AxAy.y(xxy)). 
From this we get U f=| (AxAy.y(xxy)) (AxÀ y.y(xxy)) ] f. which is equivalent to 
TU A xA y.y(xxy) ] [AxA y.y(xxy) ] f). It gives us a fixpoint combinator: Uf = 
f(U f). Like Y, U is infinitely typeable. Unlike Y, U is a supercombinator. 
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Smullyan might have been persuaded to call Y Yeşilbaş, Turkish for green duck 
(literally ‘green head"), rather than the hapless Sage bird. The common confu- 
sion about whether ducks are birds or birds are docks—since we know that 
geese and ostriches are ducks—seems fertile ground to breed recursion the 
paradoxical way. 

This is of course true in programming as well. Programmers will remember the 
bitter experience of writing recursive programs without base cases, or with base 
cases that are not reachable. 

A simple version of c-command suffices for our purposes: x c-commands y in a 
structure if x does not dominate y, and the node immediately dominating x also 
dominates y. 
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